[MUD-Dev] [TECH] Voice in MO* - Phoneme Decomposition and Reconstruction
johnbue at msn.com
Fri May 10 10:03:23 New Zealand Standard Time 2002
Ted L. Chen writes:
[snip of phoneme discussion]
> So, after all is said and done, has anyone attempted to do
> anything similar to this or thought about it in depth? My
> personal background comes more from the text-to-speech side than
> anything specifically related to MO*'s so I might have missed
> something that someone who has a deeper familiarity with the
> technical side of MO*'s would catch.
I've thought about it and I'm sure a number of others have as well.
Personally, I consider the problems of STT and TTS to be a black box
issue that others are tackling. What I want from those two things
really boils down to the following:
1. The ability to capture continuously-spoken language or
conventionally-written text into a compact form.
2. The ability to convert that compact form into either
continuously-spoken language or conventionally-written text.
In the case of the language or text, inflection/tonality/whatever
should be part of what the compact form can represent.
As an example, if I type "How are YOU today?", or I type "How are
you today?!?", the compact form should be storing two somewhat
different representations, just as if I say the questions
differently. And the output of each should be representative of
what was typed/stated, regardless of whether it is presented as text
or speech. Text is obviously capable of a smaller spectrum of
inflection and such, but what it is capable of should be retained.
The goal here is to have players both typing and speaking to the
program, with the information efficiently conveyed to those who
should receive it, to be output as written text or spoken word as
desired by the receiver. I'll skip over how I'd use the technology,
although that is the more interesting challenge to me.
> With a TTS, it is quite possible to expressively generate
> synthesized speech but it currently requires hand coding a lot of
> tags into the stream and at the phoneme level.
And as such, would fail the 'conventionally-written' text
requirement. Existing expressiveness in written text should be
relied upon. Typing is only going to be used by those who are
unable to speak, due to physical impediment or due to conditions
such as not wanting to annoy those around you who are not playing
the game. In any case, we don't want to make conversational input
slower than it is today.
> Automatically generating this data from the user for the express
> purpose of pumping it back out through a TTS synthesizer is
> something that I'm not sure anyone has focused on. The use of
> phonemes in speech processing has mainly been used in application
> to compress real voice data. These compression techniques
> however, intend to preserve the voice, which is ironically
> something MO*'s may not want to preserve. The MO* needs to only
> use a small (and somewhat) established subset of this research.
I believe that both original speech and manufactured speech are
needed. Original speech transport is needed when players are
speaking to players (telephone). Manufactured speech is needed when
characters are speaking to characters (acting). I want both in the
same game so that I can have a clear separation of in-game and
out-of-game conversations available to players. If I want to talk
about baseball, I can do it via my own voice. If I want to have my
character discuss the balance of its weapon, I can do it via my
character's voice. Note that my own voice can be sent to any player
in the game willing to receive it, while my character's voice is
limited to how far it carries in the game environment.
I would be content with current primitive STT and TTS systems such
that I can speak and the characters can talk. The differentiation
of which character is saying what can be worked out via graphical
cues and such. I just want somebody to put the thing in.
The issue of phonemes as the specific technology is not significant
to me, any more than whether the database being used is relational
or object-oriented, so long as it has the operational
characteristics that I'm after.
MUD-Dev mailing list
MUD-Dev at kanga.nu
More information about the MUD-Dev