[MUD-Dev] Parser engines
miroslav.silovic at avl.com
Fri Mar 12 09:53:34 New Zealand Daylight Time 2004
On Thu, 2004-03-11 at 18:48, Mike Rozak wrote:
> For example: Japanese (3 character sets, no spaces, verb at end,
> enter text with an IME), Chinese (2 character sets, no spaces,
> enter text with an IME), Arabic (non-Roman character set, enter
> text with an IME, ???), or even Finnish (which I've been told
> likes to combine verbs and nouns, or something of the sort). The
> Inform designers guide discusses porting to English's cousins like
> French and German, but not the more distant language groups.
Being a native speaker of a Slavic language (Croatian, to be
precise), I can tell you that any heavily inflected language is a
real chore to work with. It's not just parsing, even outputting a
barely grammatically correct prose (without worrying about the
style) is quite a bit of a challenge. The problem is that output
grammar is heavily context-sensitive. For example (dog = pas, bird =
ptica, black = crn):
You see a black bird. Vidis crnu pticu.
You see a black dog. Vidis crnog psa.
You see two black dogs. Vidis dva crna psa.
You see five black dogs. Vidis pet crnih pasa.
You<plural> see a black dog. Vidite crnog psa.
_You_ see a black bird. Ti vidis crnu pticu.
Black dog bit you. Crni pas te je ugrizao.
Black bird bit you. Crna ptica te je ugrizla.
Black bird bit _you_. Crna ptica je ugrizla tebe.
Note different cases for numbers 1, 2-4, >4, the verb form depending
on the subject's gender, elided subjects ('you' has to be elided
from the first 5 sentences, otherwise the sentence means that _you_
(and not somebody else) see something), and adjectives mutating
depending on their own case and on the gender of their noun. While
Croatian only has 7 noun cases, which is significantly fewer than 16
from Finnish, it has oodles of declinations that are impossible to
get right without resorting to a rather unwieldy dictionary.
MUD-Dev mailing list
MUD-Dev at kanga.nu
More information about the MUD-Dev