[MUD-Dev] Persistant storage.... My current idea.

J C Lawrence claw at under.engr.sgi.com
Fri Apr 3 15:42:11 New Zealand Daylight Time 1998

Forwarded to the list for Cynbe:

Date: Fri, 3 Apr 1998 17:45:22 -0600
From: Cynbe ru Taren <cynbe at muq.org>
Subject: [MUD-Dev] Persistant storage.... My current idea. 

| > On Wed, 1 Apr 1998, J C Lawrence wrote:
| >> A very simple cheat which performs very nicely is to do something
| >> as follows:
| >> 
| >> Every record in the DB is identified by a unique record # of a
| >> signed integer type.
| >> 
| >> The DB consists of two files, the database itself, and an index.
| >> 
| >> The index file is an array of the following structures:
| >> 
| >>   struct { 
| >>     off_t record_pos; // Offset of the record in the DB 
| >>     size_t record_len; // Length of the record in the DB 
| >>   }
| >>
| > I still see no reason to store the index on disk.  

Just a quick note, and sorry if I keep harping on Muq -- it just
so happens that after five solid years of working on it spare-time,
I know it better than I do other systems. :)

In Muq (http://muq.org/~cynbe/muq.html) I've done a fairly carefully
performance-tuned module vm.t which is designed to be usable with
other systems (i.e., it isn't inextricably linked with all my other
Muq code/modules) along the above sorts of lines:

Every object is identifed by an abstract pointer.

(Originally, this was a 32-pointer, currently I've moved to 64-bit
pointers using gcc's "long long" support, but this is selectable by
changing a single #define.)

This abstract pointer is in fact an integer containing two bitfields:
one specifies the object's size to the nearest power of two.

For each size octave, I maintain a db file which is essentially an
array of records of that size.

Another bitfield in the abstract pointer gives the offset of the
record in question within the relevant file.

The encoding I use actually allocates bigger offset fields to the
smaller octaves, since one is more likely to have (say) 1,000,000
8-byte objects in a system than 1,000,000 1-megabyte objects. This
is pretty important in the 32-bit implementation, less so in the
64-bit implementation.

The resulting design lets me read any object in the db into memory
in -exactly- one read(), which is pretty much provably optimal, and
likely to be at least twice the performance of many index-file based
designs.  (A factor of two in disk access speed just might matter to
some people... :)

Meanwhile, I keep a cache buffer in ram of recently accessed objects:
I'm not comfortable with the time and space overhead of LRU management,
so instead I manage them essentially as a FIFO queue that objects
cycle through once loaded into cache:  They are loaded at one end
and deleted at the other.  Since deleted objects will still be in the
OS file system cache in general, frequently used objects will get
'read' back in very quickly, giving (I hope, no performance studies
yet) much of the performance of an LRU system in practice, without
much of the overhead.

I allocate a (very) carefully tuned hashtable about 1/8 the size of
the ram buffer to map abstract pointers to current addresses: This
mapping takes about 15 risc cycles, which may sound like a lot
compared to single-cycle access via a hard pointer, but (given a
little intelligent coding, like looking up a string address just
once when copying it, not once per byte) turns out according to gprof
to be consuming only about 5% of my CPU time when Muq is running: I
feel this is a very reasonable price to pay for disk-based


J C Lawrence                               Internet: claw at null.net
(Contractor)                               Internet: coder at ibm.net
---------(*)                     Internet: claw at under.engr.sgi.com
...Honourary Member of Clan McFud -- Teamer's Avenging Monolith...

More information about the MUD-Dev mailing list