[MUD-Dev] Naming and Directories?

Ola Fosheim Grøstad <olag@ifi.uio.no> Ola Fosheim Grøstad <olag@ifi.uio.no>
Wed Mar 17 17:28:24 New Zealand Daylight Time 1999


"Jon A. Lambert" wrote:
> But you have changed your requirements above from speed to memory.=20
> No fair. ;)

fragmented memory access =3D=3D slow =3D> memory =3D=3D speed

> The alphabet trie above has some pros and cons:
>=20
> 1) avoids the overhead of string hashing

Uh? Hashing is most likely faster, unless you deliberately make it slow.=20
It isn't too difficult to come up with a reasonable hashing scheme for
small sets (without delete, or with lazy delete)

first hash =3D=20
	RANDOMLUT[string[0]] +=20
	RANDOMLUT[string[stringlength/2]+OFFSET1] +=20
	RANDOMLUT[string[stringlength-1]+OFFSET2] + stringlength;

on collision, use a second hash, e.g. =3D

	hash=3Dstringlength;
	switch(stringlength){
		default: do whatever for string[8..stringlength]
		case 8: hash +=3D RANDOMLUT[string[7]+OFFSET10];
		case 7: hash +=3D RANDOMLUT[string[6]+OFFSET9];
		case 6: hash +=3D RANDOMLUT[string[5]+OFFSET8];
		case 5: hash +=3D RANDOMLUT[string[4]+OFFSET7];
		case 4: hash +=3D RANDOMLUT[string[3]+OFFSET6];
		case 3: hash +=3D RANDOMLUT[string[2]+OFFSET5];
		case 2: hash +=3D RANDOMLUT[string[1]+OFFSET4];
		case 1: hash +=3D RANDOMLUT[string[0]+OFFSET3];
	}


> 2) provides intrinsic support for abbreviations

I don't think abbreviations is very good from a usability point of view,
it can lead to unexpected results. Besides, you can get that with hashing
too, just insert the substrings as well.

> 1) based on English and may not support blanks, puntuation, or other=20
> characters   (of course this can be altered or enforced)

Easily for limited charsets using a translation table, which would be
handy because you could make it case insensitive at the same time.

N-ary tree example:

	unsigned char table[256];
	fill table with the value 0;
	table['a']=3Dtable['A']=3D1;=20
        ...
	table['z']=3Dtable['Z']=3D27;
	table['=E6']=3Dtable['=C6']=3D28;
	table['=F8']=3Dtable['=D8']=3D29;
	table['=E5']=3Dtable['=C5']=3D30;

	struct node {
	  node *branches[31];
          object *obj;
          node(){ branches[0..31] =3D obj =3D NULL; }
	};

	object *findobj(const string &s,const node *n){
	  for(int i=3D0; i<s.length(); i++){
	    n =3D n->branches[table[s[i]]];
            if(n=3D=3DNULL) return NULL;
	  }
          return n->obj;
	}

	object *findobj(const char *s,const node *n){
	  while(*s) {
	    n =3D n->branches[table[*s++]];
            if(n=3D=3DNULL) return NULL;
	  }
          return n->obj;
	}

I suspect you can make it faster for successfull hits with zero
terminated strings by making zero a symbol... (avoiding the n=3D=3DNULL t=
est,
which isn't too expensive anyways btw)  Hmmm, could be fun to
implement... Would be useful as you could allow " ,.:;" to function as
string terminators as well. An advanced version could use multiple
translation tables and compress the N-ary tree nodes based on profiling
information.

Sketchy:

        unsigned char emptytable[256] =3D {0,0,0,0,0...};
        unsigned char terminatortable[256] =3D x in "\0 ,.:;" ? 1 : 0;
        unsigned char smalltable[256] =3D terminatortable + alphas...
        unsigned char largetable[256] =3D smalltable + rare characters
       =20
	class node { public:
          object *obj;
	  unsigned char *table;
	  node *branches[1]; //have to fake this size with malloc :-x
          node(){obj=3DNULL; table=3Demptytable; branches[0]=3D&errornode=
; }
	};

        node emptynode =3D {NULL, emptytable, NULL};
        node errnornode =3D {NULL, emptytable, emptynode};

	findobj(s,n){
	  do {
            m =3D n;
	    n =3D n->branches[n->table[*s]];
	    s++;
	  } while(n);
	  return m->obj;
	}

Bwahahaha, I love hacks like these...

	findobj(char *s,node *n){
          s-=3D2;
	  do {
	    s+=3D2;
	    m =3D n->branches[n->table[s[0]]];
	    if(!m) return n->obj;
	    n =3D m->branches[m->table[s[1]]];
	  } while(n);
	  return m->obj;
	}

:*)

Now if Raph could post all the names to all the players of Ultima Online
then we could see what actually would happen in a real life situation...
I guess one could use IRC as well. Hmm...

> The hash algorithm would treat the name string as either a 2 byte or 4 =
byte
> unsigned int which should be cheaper to hash than the entire string.  O=
f
> course, you would have to enforce a mininum name length.

Why a minimum name length? All you need is to store the strings zero
padded up to the alignment boundary, which should not be too expensive as
you often end up with such alignments anyway.

Examples:=20
	zeropad("hello") =3D=3D 'hell','o\0\0\0'
	zeorpad("hell") =3D=3D 'hell', '\0\0\0\0'
	zeropad("?") =3D=3D '?\0\0\0'

The only trouble would be if you work with pointers directly into string
buffers. There are ways to compensate for this though. For a 32 bit
implementation: write 4 versions and use masking, and make stringlength<3
a special case as you risk to mask out on both ends... (and remember
safe-hex!)

Anyway, it is almost always faster to choose a data structure that only
supports the functionality one looks for. So really, it doesn't make any
sense to discuss this stuff without some more knowledge about what kind
of statistical properties, and operation sequences we are talking about.
--
Ola Fosheim Groestad,Norway      http://www.stud.ifi.uio.no/~olag/



_______________________________________________
MUD-Dev maillist  -  MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev




More information about the MUD-Dev mailing list