[MUD-Dev] Re: Prescience Rules?

Nathan F Yospe yospe at hawaii.edu
Fri Jun 19 08:41:01 New Zealand Standard Time 1998


On Fri, 19 Jun 1998, Richard Woolcock wrote:

:Vadim Tkachenko wrote:

:> Richard Woolcock wrote:

:[big snip]

:> > I would like to (when I have the time) replace the announce command with
:> > a parser to check through spoken strings and determine what the players is
:> > actually saying, creating a 'TRUE/FALSE' lie detection accordingly.

:> Side note: make it external and pluggable, it may consume all the
:> processor power :-)

:I'm not sure...I guess I'll have to see ;)  Here is how I was planning to code
:it...

Oh boy... I suspect I'm the only person on this list to have coded (well, I
have started to code, at least) an industrial strength natural language
parser. As such... this is too much to pass up commenting on, even with my
swamped schedule. (You have noticed I've been quiet of late, no?)

:while not at end of string loop

:   if the current character is a space then
:      interpret the word
:      start a new word
:      continue
:   end if

:   if the current character is non-alphabetic then
:      continue
:   end if

:   if the current letter is a vowel and not the first letter of the word then
:      continue
:   end if

:   if the current letter is the same as the previous letter then
:      continue
:   end if

:   if not the first letter of the word then
:      word <<= 5 bits
:   end if

:   word &= character in lowercase - 'a' + 1

:   if the word is holding 6 values then
:      interpret the word
:      start a new word
:   end if

:end loop

:(roughly speaking)...then I'll have the "interpret_word" function which
:does a switch case and stores some sort of result accordingly...for example
:suppose you had the string "I have never committed diablerie" from my
:previous example...the above function would send the following words one
:after the other:

:I = i = 9
:have = hv = (8<<5)+(22) = 278 (I think)
:never = nvr = (14<<10)+(22<<5)+(18) = something
:committed = cmtd = ...
:diablerie = dblr = ...

:Doing a switch case on the first word (9), the mud determines that the
:talker is referring to themself.  The second word (278) determines that
:the talker is referring to either something the own, or something they
:have done.  The third word would inform the mud that the talker was
:inversing their claim.  The fourth word would inform the mud that the
:talker had performed a certain action.  The fifth word would inform the
:mud that the talker was referring to the act of diablerie.  From this
:(and this is the bit I don't yet know how to do) the mud could determine
:that:

:Bubba claims that the act of diablerie was not performed by himself.

:This system wouldn't be perfect, but I could probably get it 'fairly'
:accurate, and I don't think it would be too much of a drain on the
:processor.

I think I've mentioned before, my natural language parser is hosted by the
client, not the server. The tokenized meanings are passed to the server,
along with the string in the case of communication. Now, a bit about my
tokenization sequence:

Tokens are in the form of a 64 bit long. The standardized tokenized
dictionary has each token defined on the host side by meaning-class. A
meaning-class leaves vast stretches of undefined tokens to be added at
a later date, which are similar to the proceeding set. The token ranges
are broken into nouns, verbs (further broken into 32 tenses by the last 5
bits of each), adjectives, adverbs, relationals, states, abstracts, and
associatives. Names, incidentally, are associatives until the client can
identify a possible noun value for the name. The categories are broken up
by the first three bits of the token. Some of these are going to be
(obviously) quite empty, but... well, hell, with 56 bits to play with, I
suspect most of the categories are going to be fairly empty. The good
part about the range-relationships is that the association value of 
unknown words is subject to mutative weighting on the parser's neural nets.
In other words, eventually, the parser could learn to *really* understand.
The bad part is, the client has to initially download a *monsterous* lex
file. Each word in the lex file corresponds to one token. Several words
share each alphanumeric key. All elligable words for that key compete for
a parse. Some tokens will, in a given language (I only intend to complete
English and Japanese myself) refuse to evaluate in context of other tokens.
This narrows meanings. The parsed sentance produces a number of meaning
strings. Any meaning string with a low probability evaluation is discarded.
Remaining strings are weighted for probability on local context. Local
objects have token keys associated with them. Descriptions of local objects
are derived from favorable assembly of the reversed parse process... they
are described by the words afiliated with their primary tokens. This should
allow multiple languages to interact smoothly, even to the point of (maybe)
someday allowing a series of possible translations for client to client
communication between speakers of different languages. (I know, I know...
"The spirit was willing, but the flesh was weak." ... but it's better than
nothing, and that would show up *very* low on the probability scale. I hope.)
The side affect here is that most intelligent NPCs will be run remotely, by
"smart clients"... in other words, I have a "client" that is designed to
act on the tokenized meanings it recieves, rather than parse them to words,
and that will thus have some degree of intelligence... hooked up to an "NPC"
that can also be used by admins... and if it is designed for conversation,
hopefully one of these will someday successfully pass Turing tests.
--

Nathan F. Yospe - Aimed High, Crashed Hard, In the Hanger, Back Flying Soon
Jr Software Engineer, Textron Systems Division (On loan to Rocketdyne Tech)
(Temporarily on Hold) Student, University of Hawaii at Manoa, Physics Dept.
yospe#hawaii.edu nyospe#premier.mhpcc.af.mil http://www2.hawaii.edu/~yospe/






More information about the MUD-Dev mailing list