[MUD-Dev] Parser engines

Miroslav Silovic miroslav.silovic at avl.com
Fri Mar 12 09:53:34 New Zealand Daylight Time 2004

On Thu, 2004-03-11 at 18:48, Mike Rozak wrote:

> For example: Japanese (3 character sets, no spaces, verb at end,
> enter text with an IME), Chinese (2 character sets, no spaces,
> enter text with an IME), Arabic (non-Roman character set, enter
> text with an IME, ???), or even Finnish (which I've been told
> likes to combine verbs and nouns, or something of the sort). The
> Inform designers guide discusses porting to English's cousins like
> French and German, but not the more distant language groups.

Being a native speaker of a Slavic language (Croatian, to be
precise), I can tell you that any heavily inflected language is a
real chore to work with. It's not just parsing, even outputting a
barely grammatically correct prose (without worrying about the
style) is quite a bit of a challenge. The problem is that output
grammar is heavily context-sensitive. For example (dog = pas, bird =
ptica, black = crn):

  You see a black bird.           Vidis crnu pticu.
  You see a black dog.            Vidis crnog psa.
  You see two black dogs.         Vidis dva crna psa.
  You see five black dogs.        Vidis pet crnih pasa.
  You<plural> see a black dog.    Vidite crnog psa.
  _You_ see a black bird.         Ti vidis crnu pticu.
  Black dog bit you.              Crni pas te je ugrizao.
  Black bird bit you.             Crna ptica te je ugrizla.
  Black bird bit _you_.           Crna ptica je ugrizla tebe.

Note different cases for numbers 1, 2-4, >4, the verb form depending
on the subject's gender, elided subjects ('you' has to be elided
from the first 5 sentences, otherwise the sentence means that _you_
(and not somebody else) see something), and adjectives mutating
depending on their own case and on the gender of their noun. While
Croatian only has 7 noun cases, which is significantly fewer than 16
from Finnish, it has oodles of declinations that are impossible to
get right without resorting to a rather unwieldy dictionary.
MUD-Dev mailing list
MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list