[MUD-Dev] [TECH] Voice in MO* - Phoneme Decomposition and Rec onstruction

Steve {Bloo} Daniels bloo at playnet.com
Fri May 31 10:57:02 New Zealand Standard Time 2002

Koster, Raph wrote:

> Actually, full duplex speech whilst mixing multiple voice streams
> and also handling all the other game audio necessary isn't
> trivial, unfortunately.  Many of the solutions out there up to now
> do indeed require one to take turns, or cap at say four voices at
> once. And it can be expensive in CPU.  Lastly, the quality with a
> lot of voices can leave something to be desired--just as the
> quality of a five-way telephone conference call can.
> Don't get me wrong--voice is the future. I am not sure that it is
> quite ready for primetime yet though. It's taken over five years
> for a solution for voice disguising to be developed and deployed,
> for example.

Brace yourselves, I'm getting my wind on. ;-)

As many programmers are found of saying, you can program anything
with enough money, time and talent.  You're correct, it isn't
trivial.  But it *has* already been done.

If you haven't, you should take a look at Roger Wilco.  It has very
low bandwidth and low CPU usage and is full-duplex.  You choose the
bandwidth you want to allocate to it by identifying the speed of
your net connection - select one lower than what you have and you'll
use lower bandwidth.  The minimum bandwidth, a 28.8k modem, was
about 2400baud receiving, 4800 transmitting.  For CPU, it needs a
Pentium 166 or better.

Loss of performance?  What loss of Performance?  In early 1999, I
could have 6 people on a channel with frequent usage and see a frame
rate loss of only 1-3 fps in Quake II on a 200Mhz celeron with 28.8k
dial-up connection.  With 14 people on in Warbirds, I'd lose 5-8 fps
with frequent simultaneous talking. (But trust me, without good
voice discipline, you don't want more than 4 people on a channel!).
I never saw *any* performance drop in Ultima Online, Everquest,
Asheron's Call, Anarchy Online, or any other online game you can
name - and my friends and I would stay often stay connected for more
than three hours.

Actually, the most difficult think about using voice programs with
online games is that some games disable Alt-Tabbing (to make memory
hackers work a little more difficult). This makes it very difficult
for players who meet in game to change their voice settings.  To
help get around that hurdle, Roger Wilco still offers 'Easy Bake'
integration for developers with actual code samples on the web site.

  Disclaimer: I used to work on Roger Wilco, first for Resounding
  Technology, later for HearMe (formerly MPath/Mplayer) in a
  non-programming capacity.

I wish I could tell you what the never-released version of RW that
was coded in late 1999 was capable of, but you probably wouldn't
believe me, and I believe I still have a duty of confidentiality
with regard to it.

The quality of sound is most dependent on the following, in
descending order of importance:

  - Transmitter's Hardware (Mic and Soundcard - and lack of external

  - Transmitter's Application Settings

  - Transmitter's Mic Position (Mic-Mouth distance and angle)

  - Receiver's Hardware

  - Receiver's Application Settings.

  - Transmitter's Voice (some voice codecs, particularly the one
  Roger Wilco started with in 1998, are optimized for the middle of
  the voice spectrum and squish high-pitched voices into the same
  squeaky range).

Single, biggest problem?  Crappy identification and documentation of
soundcard capabilities from manufacturers.  At least in 1999, the
soundcard industry was plagued many problems with this.  Some
soundcards of the same name used different chips, sometimes by
different manufactuers, which behaved differently.  Some soundcard
drivers were...um...just bad.  A few even worked better with drivers
for different cards.  Determing whether a card was actually
full-duplex or not was a huge chore.  Often, a cheap SoundBlaster 16
would have 'land-line phone quality', where a top-of-the-line
SoundBlaster Live Pro Platinum with Whistles would be nearly
impossible to configure correctly.

Oh, and did I mention that Roger Wilco has had a working Mac version
since mid-1999?

There are other products out there that also made great strides in
the direction of cheap, easy, voice on the net.

Shadowfactor's BattleCom was built on DirectPlay and offered
something like 16 different codecs that you could choose from for
optimal quality in a given situation, though I always saw
'appreciable' loss of framerate while playing games with it - it was
both more CPU and bandwidth intensive than Roger Wilco, at the time
anyway - but not enough to damage your gaming in all but the
twitchiest of shooters.  Development of it stopped after it's
creators, having made themselves Jedi Masters of DirectPlay, were
hired by Microsoft to help build Game Voice.

FireTalk and VoiceStream also made early attempts.  TeamSound is out
there now.  There may be others, but since my RogerWilco works
perfectly, I haven't had a reason to change.  :-)

As for voice disguising, what the voice programmers I know call
"Voice Processing", the reason this hasn't been implemented by
anyone yet, is there there is little short-term financial incentive
to make it happen.  If you can pay the right programmer, it is just
a matter of money and time.  I think it would be less than a year,
but it would take the right programmer fully dedicated to it.

The business model for voice-product based company of this sort is
rather difficult.  What it needs is a company with a broader
business base to fund it.  This is six-figure programmer salary
level development - if you want it done right, anyway.  :-)

Forgive me for going on so much, but I just wanted to share what I
know and make the point that Voice is *Not* the Future -- Voice is
the *Now* - If You Want it.

If you have any other questions about RogerWilco or would like to
contact the programming geniuses behind me, contact me off list.

(formerly of Roger Wilco)

MUD-Dev mailing list
MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list