[MUD-Dev] [TECH] Voice in MO* - Phoneme Decomposition and Reconstruction

Ted L. Chen tedlchen at yahoo.com
Wed May 15 17:57:53 New Zealand Standard Time 2002

Hans-Henrik Staerfeldt Writes:

> I wonder, if the current speech compression algorithms and
> bandwitrh isn't getting fast and good enough to both encode,
> decode and transmit several channels over the net. As far as i
> know the bitrate is about 0.77-2 kbps (99% recognized) for
> compressed speech. Then only retransmit atmost a few streams to a
> client. If theres alot of 'background babble', then make a
> unintellible background babble generator and a single 'babble'
> stream whose 'content' is 'gleamed' off at the server side
> (basically just a low ununtellibly mumble generated at the client
> side with only volume control or perhaps general pich adjusted
> through the server).

Even at 2kbps, I'm wondering what impact this would have on a
server's bandwidth.  Let's just assume that the average speech
segment is roughly 3 seconds long and encoding is at 2kbps.  That's
almost 750 bytes.  Comparatively, the same segment would likely be
80 characters or less typed.  Decomposing into phonemes would
compress it down to about 30 bytes (assuming approx 3:1
letter->phoneme conversion).  To roughly equal 750 bytes, we would
have to attach 25 byte tags to each phoneme which I personally
believe is much more than required (conjecture at this point in
time).  In most cases, I'd assume 3 tags (volume, pitch, speed)
which brings it back up to 80 bytes.

My personal experience with MMORPGs has been exclusively client side
with AO and a little EQ.  However, basing off my experiences with
AO's separate chat server, even just simple text seems to be
stressing it at times.  With easy access to voice, this lowers the
cost of entry to communication and might therefore increase usage.

I'd be interested if anyone has any data on the usage of chat on
their servers.  How much bandwidth it takes up currently and stuff
like that.

> Since speech get unintellible fast if people talk over eachother
> anyway mixing streams reliably with a great number of speakers
> don't really pay off. Instead only take the nearest streams and
> add the babble effect.

This is a good idea.  Although I would add that the client perhaps
even does a little bit more processing even after the server
performs its culling.  For instance, the client also uses the
character's POV to increase/decrease the volume of the nearest

> Sure it impinges on the desire for low server bandwidth, but
> really how low is low now-a-days?

> See article;

>   http://imsc.usc.edu/research/NSF_year_five/Speechcom.pdf

Thanks for the article.  At the very least, this might be useful for
direct player to player communication as suggested by JB.


MUD-Dev mailing list
MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list