[MUD-Dev] voice vs. text

Lo Lo
Mon Feb 21 11:28:28 New Zealand Daylight Time 2000

> I'd have to agree with Dr. Cat here. I think adding real-time voice chat
> to games is going to be inevitable. I don't know that much about the
> technology, but it seems to me that the client could have some sort of
> capacity to alter the sound of your voice, no?

Yep, no problem there.  In short yes.  Here's an executive summary of the
principles of voice comms, much edited due to failing memory, if you are
interested in the workings:

There are two basic ways of transmitting voice data plus another which
combines the two.

Method A is to do the waveform wholesale.  Grab the input and pipe it down a
line with whatever compression/encoding/error detection algorithms.  Cheap
on transmitting and receiving computations, expensive in terms of capacity
needed to transmit.  Transmitted voice should sound vaguely what it oughta
be at the other end.

Method B is the vocoder method.  We've all heard the pop songs with those
awful vocoders by now, and if you're too old to have noticed, primitive
vocoders sound just like Daleks.  Vocoders require more client power but
less bandwidth at the expense of quality.  Instead of transmitting the
entire wholeform, only key attributes of the waveform is transmitted like
pitch.  Like I said, the result is poor.

Method C is the hybrid way, use a vocoder for the boring bits like 'ssssss'
which is essentially white noise produced by blowing air thru teeth and use
the waveform for more interesting bits which generally involve the transient

What I suspect, for someone who wants to alter the attributes of the voice,
is to take Method C and start playing around with the waveforms which are
interesting to the human ear.  However, since there has been very little
demand for altering the voice to sound like cartoon characters, I don't know
too much about what to tweak (asides from modulating the pitch and playback
speed).  The advantage of the above coding schemes in preference to mp3/4 is
that the above are specific to voice and not music.  Also, music tends to be
composed of a spectrum of stuff whilst, with voice comms, there is only one
voice to compress hence you can tailor the algorithm accordingly.

If anyone is interested in the above stuff, I do suggest taking a look at
the parametric codecs as opposed to the linear codecs (ie: ACM).  You'll
also need the algorithm to be tolerant of dropped packets, latency, etc, as
with all the other internet apps.

I do wonder what happened to MetaVoice and MetaFont.  Apart from a lot of
doodah back in 1996/97 when random multiplayer game providers signed on,
there hasn't been a squeak.

For those who don't know what I'm on about, check out:

A review

Their homepage (under construction, maybe at a later date)

A comparison

The latter is especially revealing since the MetaVoice codec works on 2.4
kb/s.  From what I know, though, Voxware, who made the product, has stopped
research on voice modulation in favour of normal VoIP trade but will carry
out work on a customer-specific basis (ie: You'll need a lotta money).

There are other problems with voice comms though.  Apart from the HCI aspect
of voice commands being far less snappy than touch commands, brought on due
to the need to detect a definitive pause after the command (processing is
not an issue these days), there are minor issues such as having to be
actually in-character.  To act like a rabbit on valium, you'll need to act
like a rabbit on valium in a voice medium, with or without a tonebox.  I'll
certainly be casting odd looks your way if I notice someone talking to the
computer in that fashion.

Plus, with several conversations mixed in, only one microphone and two
speakers, tuning into specific voices gets more difficult, especially as
most PCs aren't supplied with remotely decent speakers.  I suppose if would
not be too difficult to have the mouse cursor allow you to tune into areas
on the screen.

Which reminds me, whilst I'm here.  One of the issues that seem to be a
recurring nightmare on mud-dev is the difference between intent and result
when issuing commands.  It seems worthwhile to add extra tags/commands to
allow for this intent to be conveyed to the server/client.

It wouldn't be a completely bad idea to expand the command range with very
similar commands that only convey a different tone, eg:

  > say Howdi you lot

Could be typed in as:

  > greet Howdi you lot

I admit this is similar to the mood thing Raph K put up a while ago.  I just
think the general concept is a particularly good one although I don't know
if it would get used by players used to that universal say command.  By the
way, this wouldn't just be restricted to say for text muds.  Just flick thru
the entire dictionary of verbs and attach intent/emotion tags to the verbs,
this would be particularly useful when it comes to interacting with npcs.  I
have to admit, though, that I'm not a fan of pose commands...
K Ling Lo					Defence Division
lok at logica.com					Logica UK Limited

All opinions expressed are solely those of the author and not of Logica.

MUD-Dev maillist  -  MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list