[MUD-Dev] Re: From DevMud: Database module

Greg Connor gconnor at nekodojo.org
Sat Jan 16 18:15:23 New Zealand Daylight Time 1999


>> = Greg Connor<gconnor at nekodojo.org> wrote:
>> [Regarding "trying to develop an API for a generic database module".
>
>> So, I'm now working on an alternate proposal, that would break
>> records up into fields that the caller defines ahead of time.  You
>> could use the interface for storing a single binary, even the
>> in-memory representation of a struct, but in order to properly
>> sort and traverse/search the data, the database needs to either:
>> learn about the structure of the data to know which bytes to index
>> callback to a client function to deconstruct it or, just have the
>> client hand in pre-separated fields in the first place :)


I have another draft of this proposal ready, and will send it along in a
separate message.  The new design allows you to store "blobs", or to break
records out into separate fields.


This is a response to J C's previous message (which for some reason I
didn't see in my muddev box until just now :)


> = J C Lawrence wrote:
>I'm really not sure how to answer this one in less than several
>thousand words.  You are staring at pandora's box.
>
>There are in essence two approaches to data in data bases:
>
>  1) Objects are opaque.
>  2) Objects are compounds or aggregates of known structure.
>
>#1 is well known and obvious.  The simple summary is that the
>database is utterly dumb and knows nothing about the data it is
>storing.  It just stores "blobs" and allows you to access those
>opaque blobs via some key or index mechanism.  dbm and all its
>derivitives, as well as all the tony-* clan servers, MOO, COOL,
>Cold, etc all derive from this.


I was going for something similar to this with the first proposal, but it
turned out that it was going to be little more than a glue layer over DBM -
in case you want to use a sql server later to emulate your DBM database.
This is arguably of little value over just DBM itself, and may add overhead.


>#2 is where RDBMS'es, SQL, and the rest of that horde enter.  It
>says that the data comprising objects (or records) is not only
>known, but can be usefully indexed, accessed, or otherwise
>manipulated in intelligent fashions.  


Well, there are some things that I as a programmer might want the database
to take care of for me, rather than trying to build on my own, that DBM or
other "blob" databases don't have.  I guess I am trying for some middle
ground between DBM and SQL/RDBMS.

Specifically, the proposal for a generic databases assumes that the data
will be Relational somehow, but doesn't manage these relationships or keep
them synchronized.  This is something that might be added at a higher level
or in an "enhanced" version, but right now it's not "generic" enough for me
to assume I can add value without focusing on a single client.

However, I think indexing and searching/sorting is something that any
developer could use, and I think it might be a useful tool (for example, in
implementing a relationship, you would want the related field to be indexed).

I am also trying to address what I see as a "disconnect" between developers
of some older muds (like Mush, Moo, etc) and the tools that are available
in the DBMS world.  That is, there are techniques like algorithms and data
structures that can be used to solve some common problems, but the muds I
am familiar with don't seem to use them.  (Why use a disk-based data
structure, when you can unpack everything to Ram and then relegate it to
swap space?  Oh, except that bothersome checkpointing, and it will never be
distributed, etc :)

However, I am not going to put myself in the position of saying "Here's a
set of predefined fields, why not use this structure for your mud you
haven't developed yet?"  In other words, this "generic database" effort is
an effort to provide the tools to build and work with tables, not a
pre-defined set of tables.


>In the general business case this is quite ell and good.  The data
>falls into well defined patterns that are known in advance and can
>be accomodated in an elegant manner.  Unfortunately these
>characteristics are not commonly shared with MUDs.
>
>Assuming only object oriented MUD servers (well, ibject inheritance
>really).  Loosely writing MUDs fall into two categories:
>
>  a) Designs which have pre-defined and well known object
>heirarchies.
>
>  b) Designs which allow end-user defined object heirarchies.
>
>For an RDBMS #a is a simple case.  You build tables, one table per
>base object type, one column per object attribute or method, one row 
>per object.  As your total set of object types is well known and
>defined, your total number of tables is finite, documented, and can
>be explicitly programmed against.
>
>Interestingly, Diku and Aber are moderately good examples of #a.


I'm not going to try to define "what constitutes an object" and I'm trying
to get away from using the word "object" to define "a single thing that the
database stores".

(Some implementers might choose to create tables where one row is one
object, and each column is a distinct method or attribute...  I don't know
if this would be an effective use of an RDBMS, but I would have to see it
in practice before saying for sure.)

(Another implementation might be to have a table for "objects", another for
"object types" possibly others for "attributes" and "methods" - this allows
the RDBMS to focus on what it's good at, managing dynamic relationship
between different sets of somewhat homogenous entities.  However, I'm not
going to focus on either of these, because I'm not ready to integrate an
RDBMS - the Relational part is beyond the scope of the current project.)

Anyway, from the point of view of the database module, as well as to the
backend engine, whether the client chooses one huge table or several
different tables is up to the client... I'm not going to make assumptions
like "Oh, this piece of data is a Method, I should treat it differently
from some other string of bytes".  I'm probably going to define very few
data types, and they're probably going to be pretty atomic (ie. close to
the machine or compiler-native level).  I think there is still some
incremental value that can be added over the flat-file or dbm-hashed-table
standard fare.

So, in terms of the project I am proposing, an "object" is a pretty atomic
piece of data, or a simple homogenous construction of simple types.  I
assume that there will be another useful layer at which "object" means
something else, either to the client or the other-module developer.

In other words, I am rejecting the notion that you have to know everything
about the intended applications in order to build a good tool.  I know this
is not what you were implying... I'm just noticing that there's a general
trend to gravitate toward both ends of the spectrum rather than settling in
between.  On one side, you know a great deal about the structure of the
data, so you're able to craft a storage scheme that's highly tuned to this.
 On the other hand, you know almost nothing about the data, and your tool
has almost no features at all other than "read" and "write".  I'm proposing
that additional value can be added, while still keeping the tools "generic".


>#b is a bitch.  This isn't just things like Cool or ColdX which
>allow the object heirarchies or inheritance to be edited at runtime
>with the results intantly reflected in the world -- its *ANYTHING*
>that allows something __other__ that the core server to define the
>object heirarchies.  (think of the difference in heirachiees and
>base design assumptions in the various LP base MUDlibs).  The key is 
>that the server does not define the object heirarchy.
>
>With #b what we have now is genericism across the board.  We don't
>and can't know in advance what any of our base classes are going to
>look like for the final representation.  Even if we do mandate a few
>starting base classes, what gets built on top of those can be
>anything.  


Yes, this is closer to what I'm proposing.  Of course, I'm not planning on
delivering all of this, just the database layer.  The question is, can you
really separate what is part of the "database layer" and what is part of
the game engine?  I think you can.


>...Ergo our previously neat tables with one column/field per 
>method are now disjoint as we have no way of predicting how many
>methods an arbitrary object is going to have, and what the impact
>and significance of those attributes will be.


I didn't propose that fields be stored as columns, but yes, this would be
one problem with such an approach.


>Classical OO DB's are not designed for this case (see the archives
>for some interesting URLs on the area posted by Lambert).  The OO
>DBMS'es are variations on the pre-canned phenomena, you pre-define
>toe heirarchy and then execute from there having mandated that
>nothing will ever change.


I have seen a couple of OO db's... I couldn't tell whether they added more
value to the DB by being object-oriented, that couldn't be added by the
application layer for the same cost (or less).  In other words, I am not
convinced that you need an Object-Oriented database engine to implement an
object-oriented game (unless your development language of choice is SQL :)
(This is similar to the reason you don't need a multi-threaded OS or
multi-threaded program to implement a multi-threaded interpreter)


>The result, if you're going to go for the flexible deal is that you
>have to work at very high abstraction levels.  You don't and can't
>know what you are working on, so you have to be able to handle
>(mostly) anything, and figure the rest out while you go.  

>...You code
>has to be self-intelligent and figure out your heirarchy and other
>relativistic structures at runtime and then interpret its
>significance, possibly with the help of embedded hits in the
>user-defined structure.


I split up the above paragraph because it seemed to be two different ideas.
 Figuring out the hierarchy is probably going to be done at a higher level
than the database, I'm guessing.  Do you think there's a reason why the
database engine needs to figure out what kind of hierarchy there is?  Do
you think if it doesn't then it's not really adding value?


>  ie  To access you database you need an interpreter which can
>figure out what the mess means and how it all really relates and
>from three what actually is an object etc etc etc.  
>
>Translation: Your DB is opaque and you now have a translation layer
>between your DB and your code that tells your code what the opaque
>DB records really mean etc.  Doesn't this sound familiar?  Somehow
>we've gone back to the very spot we started with: opaque DB's with
>the code, not the DB, interpreting the contents of the
>objects/records.


Right... as you said, there will be a game-engine that knows this, and a
backing store that doesn't.  Is that a problem?

Example:  I want to sell a DBMS, and I want it to be suitable for a broad
set of operations that are still somewhat generic in nature.  I'm afraid,
though, that nobody will buy it, because they need accounting software, and
this doesn't do accounting, and they need inventory tracking, and this
doesn't do that.  There is still a substantial gulf between what I am
developing and what the shrink-wrap customers will buy.  But should I get
scared away by this?  If so, Oracle and Sybase will make all the money...


>Aaaaargh.
>
>Further, taking this approach objects start devolving into
>structural relationships of their components instead of the more
>typical _behavioural_ definition of an object.  As such an object
>becomes a (potentially) nested collection of various data types:
>attributes, methods, etc (table for inheritance, table for methods,
>table for attributes, etc).
>
>This devolution of objects into over-muscled structures destroys
>much of the value of objects (their internal opacity), and adds an
>incredible overhead in the number of talbe queries which have to
>occur for even simple resolutions.  Performance suffers, badly.
>
>I did some early work playing about in this area (see archives for
>things I did with SQL).  It was not pretty.  I'll admot to being
>largely SQL ignorant.  It still shouldn't have been *that* ugly.


I am interested, because this seems similar to what I'm proposing.  There
are lots of successful database applications that use hundreds of tables
and several table lookups per transaction.  Of course it's never going to
be as fast as alloc'ing something in ram, but then you have scalability and
distribution to think about.

In other words, SQL and other backends work, and quite a lot of things are
based on top of them.  I'm not sure what's different about Muds.  

At the other extreme, if you know your data intimately, you can nearly
always make a database implementation that does things faster and with less
overhead.  I am guessing that a few mud servers do this, but this means the
database is specific to them, and has to be co-developed with the mud.  The
DB developer doesn't want to spend a lot of time for a backend that only
has one customer, and the mud game-engine developer is kind of locked in to
that data structure now.  (I'm also betting that most mud developers don't
have the time an energy to do this, and end up writing their own thing that
is pretty rudimentary, then adding more complexity to the game engine to
account for missing features in the database.)


>A large part of the problem (for me) is that I explicitly want an
>undefined (at the server level) object inheritance heirarchy.  The
>server natively supports multiple inheritance and allows multiple
>roote.  Further, the object heirarchy is editable at runtime.  None
>of these characteristics created the problem, it was there already.
>They merely exacerbated it.
>
>I required an OO DBMS which was capable of operating on arbitrary
>object heirachies and definitions and thus addind definitional and
>relativistic value to the objects it contained.  Nobody, and this
>encludes the DB research establishments, appears to have done that
>yet.
>
>There hope however, if a wan sort of hope.


This is probably a small part of the demand in the dbms market, I am
guessing.  I agree, this would be a cool invention to have.  I think most
DBA's who buy dbms packages want to write once and leave the structure the
same for the life of the end-user application... so I think if there were
such a beast, a lot of folks may not even notice.


>If you look at the DB services offfered by things like ColdX and co,
>in essence they are (very poorly featured) OO DBMS'es which allow
>and support arbitrary object heirarchies etc.  However what you have 
>to look at is the ColdX interface, not the underlieing opaque DB
>interface.  In this view ColdX itself is an OODBMS with an interface 
>language called ColdC.  The opaque RDBVMS underlieing it is merely a 
>storage technique and really has no relevance to the final interface.


I am proposing something closer to the second (underlying) db layer, than
the first.  Mostly because I don't think "oo" and "dbms" add a lot of value
when mixed.


>
><<If this message is disjoint I apologise.  This system PANICed
>twice while composing it while various other RL emergencies also
>intervened>>


No problem :)  I think I repeated myself a time or two in the reply too.  I
appreciate that you took the time to reply despite Real Life intervening :)


>> This leads into the discussion of how to pass values back and
>> forth...
>
>There is a more subtle question:  
>
>  Are MUD-world objects a function of the DB or external to the DB?
>
>Translation: 
>
>  As far as the DB is concerned, is it responsible for the contents,
>correctness and relevence of object contents, or is it responsible
>only for the objects themselves?


There are various points along the spectrum... at one end you have a
"database" called either ufs or ext2fs, that stores any number of objects
of type "block".  Only slighty elevated from this primordial level you have
DBM, that maps a between a single "key" and a single "datum" each of type
"blob".  Now you don't have to worry about record length, and you can
actually store any number of items, then remove a lot of them, and still
have a somewhat reasonable structure.  But you can't search or sort.

At a much higher level, you have a game engine that stores data of type
"object" which is a container for "attribute" and "method" objects, and has
an interpreted language, etc.  This is not really a database, but you could
probably argue the same about the other two.

I'm not solving the complete "game engine" problem because I don't think I
can solve it myself, this year anyway.  I think I can solve part of it, and
have that solution be somewhat usable to others.


>> Just in terms of how to pass a "block" of data around, some
>> alternatives come to mind.
>
>My approach:
>
>  The DB is responsible for all objects and their storage.  The DB
>hands out read-only pointers to copies of objects it maintains in
>its cache.  
>
>  Translation: The DB only hands out references to read-only
>objects.
>
>  The pointers are further trapped (magic of C++) such that any
>attempt to write via those pointers causes the object to be
>cloned/copied into a new block and the value of the pointer changed
>to point to the new writable object.
>
>  Translation: The pointer is really a handle that the DB
>interactively interprets at call-time into a memory pointer.  The
>pointer is wrapped however so that the caller cannot (simply)
>devolve a reference into a real memory pointer, but as far as code
>flow is concerned the reference behaves in all ways exactly like a
>normal memory pointer.
>
>This all of course ties in with my non-locking model.  As the DB
>owns all allocations relating ot objects, upon thread disconnection
>from the DB all related allocations are reaped, handled etc as part
>of the normal tear-down procedure.


This looks like a valid implementation.  I'm probably not going to use C++
so I don't think I'll be able to "trap" pointers... that sounds
complicated.  I think I want it to be somewhat language independent, so
I'll need to do some of the things that the constructors/destructors or
overloaded operators do by defining explicit functions.  Reading and
writing probably won't be as transparent in my model as in yours.

Regarding your non-locking model... I read about it and I think it's great
(and modified my db proposal to use it).  What do you do if someone writes
to three objects and the writes fail to commit at the end?  You probably
don't wait for the destructors to be called in order to process the writes,
do you?


>> - DB uses "request cookies" and read/write/copy semantics Instead
>> of just a pointer, the client gets a "cookie" - either a number
>> that the DB can track back to a specific request, or a
>> pointer-plus-ID-number...
>
>And what are your locking/logical_correctness semantics?


At the time of previous message, I didn't have any :)  Having read your
lockless semantics I borrowed from it quite a bit.  :)


>> In order to make the db re-entrant and thread safe, the DB would
>> have to make sure that its routines use proper semantics, but if
>> this is all implemented within the DB module, at least there is a
>> *chance* of making things multi-thread aware :) Whereas if you
>> hand a simple pointer to an external routine, you may never be
>> able to get complete control of thread issues.
>
>This is one of the bits I like about my approach.  Only the kernel
>of the server really needs to be aware of the process model for
>events and data.  Clients, in the form of user-written code need not
>know the slightest detail about that, or for that matter about
>persistancy.  It all happens automatically under the covers, always.
>Guaranteed logical correctness and minimum performance.


I think I agree with this.  I haven't implemented it, but it looks quite
logically correct.  There will still be the case where a writeback fails
because data changed out from under it, like in the Dragon's Dinner, but
hopefully the developers of the rest of the engine will have a graceful way
to back off and restart a transaction.  


>> But even in this case you probably want an "index" - this time,
>> it's an index by "location".  Your mud code may or may not
>> maintain a constant list of "things that are nearby" - chances are
>> it doesn't.  Also, maybe it makes sense to just look at the room
>> you're in for its "contents" - but ultimately it boils down to
>> "which five records, out of thousands, have the same 'location'
>> value as my player?"
>
>I go for the performance versus storage scheme there.  My data has a
>horrible normalisation factor as objects contain lists not only of
>what they contain, but what immediately contains them.  There are
>also spatially bound objects which establish "ranges" about the
>locations of other objects (eg everything "close" to Bubba), which
>allow for easy processing of that data set (see prior threads on
>neighborhoods for a partial reference).
>
>> "Contents" and "location" are reciprocal, but are also
>> many-to-one.  
>
>Yup.  I have somewhere just under a score of different types on
>containment.


It sounds like "containment", "ownership" and "parenting" are examples of a
type of relationship that occurs a lot in games (as well as other
real-world applications).  I think this is a good candidate for something
you write once and use over and over.  If the lists for location and
contents are maintained by "move", then you have to program and
troubleshoot this all over again when it comes to chown, and chparent,
etc..  There is more chance you will get one wrong.  

However, if you leave this up to the database to handle, you run the risk
that it won't be 100% performance-optimized.  This is also a great example
of something you can get better performance out of if you make it
individual and hand-crafted.  You will have to decide this for a lot of
things, but ultimately there is a limit to the largest project you can
cause to be created, if you have to create it all yourself.  

Said another way, some things you will use existing, third party technology
for, like the compiler, the network transport, the file system.  Why not
the database, if a suitable one exists?


>Yes, my objects end up kinda big.

That would be another reason this is great for some purposes, not for
others (size per object is another one of those time/space tradeoffs, as
well as a scalability issue)





More information about the MUD-Dev mailing list