[MUD-Dev] Comments on the DB layer

clawrenc at cup.hp.com clawrenc at cup.hp.com
Tue May 6 18:42:36 New Zealand Standard Time 1997


In <199705070531.AAA16772 at dfw-ix5.ix.netcom.com>, on 05/06/97 
   at 10:41 PM, "Jon A. Lambert" <jlsysinc at ix.netcom.com> said: >>
From: clawrenc at cup.hp.com

><stuff snipped - you know what you spaketh, I hope I remembereth>

There's only one possible answer to that: non compis mentis.

>This poses some interesting problems:  

>1) How long do you keep old objects in the DB.  If your transactions
>    are many you might end up with a large DB of mostly old objects.   

This is a problem with supporting any form of rollbacks.  I could get
*really* nasty and do some form of inline compression, but I doubt its
worth it (will check tho).  I suspect some form of reverse context
diff would be a lot more profitable.

My base intention is for the DB to run a sperate very low priority
thread which does little but scan the DB deleting any objects older
than an XXX configurable time limit.  This means of course that DB
size is dependant on DB activity and not DB contents, but I don't see
a way out of there.

Later thought:  If I make my object formats known by the DB (ie
tightly bind my DB implementation to my object format), then it would
be fairly easy to have the DB only store deltas for the older
versions.  Currently my objects consist of four lists:

  List of parents
  List of attributes
  List of methods
  List of verb templates

with a little blob at the top for ObjectID and other maintenance data. 
It would be fairly easy to make the prior versions of the objects only
contain those members of the list which have changed...

<thinking>

2) If the full 128 bits is part of the key your indexes, trees,
hashes, >  or whatever your using, they could get larger and your
searches could be >  longer.  Also longer searches if many old objects
from 1) above.

The only things that should ever use a 128bit ID should be those
processes which are explicitly interested in old object versions.  All
the rest (and the default) will be to use the 64bit IDs.  This
transparency will also extend all the way down into the DB.  By nature
the DB will always default to returning and processing the latest
version.  This is automatic to the extent that the DB internally has
to go thru extra mechanics to deal with anything other than the latest
version (ie retrieve latest version data, use that to locate prior
version data, retrieve that, determine if version wanted, repeat as
needed).  

I'm hoping for little performance impact on the rest of the world due
the the ID length change.  All the cost will come from the general
transaction and rollback support.

Hurm.  I guess there will be a slight expense due to the latest
objects now being scattered over a much larger file, thus increasing
disk head motion...  <shrug>

>3) I can see how you would get numbers of killed mobiles by checking
>    how many old objects of the type were dead.  I don't see how you 
>    XREF with the weapons or spells, unless you store this info with
>    the dead mobile object or the weapon or spell object undergoes a 
>    state change requiring it to be stored with the same transaction 
>    time. 
>
>    Perhaps logging certain events might be easier, though limited 
>    because you are guessing at what your potential queries will be.

Note: This is to generate a list of all the players who killed XXX
mobile in the last couple weeks, and for each of them also list what
weapons and spells they used in the fight. The reason to do such
things is to investigate and repair game balance.

Currently to do the above I would:

-- Locate the class which defines the XXX mobiles.

-- Iterate across all prior versions of that object for the requested
time period.

-- From each prior version extract its list of instances.

-- Remove duplicates from the extractions.

-- Iterate across the list and record the transactions which deleted
them.

-- Iterate across those transactions and list all the player objects
referenced by the transaction.

-- etc...

I can move backwards along the player object-version line, I can
examine their inventory.  Heck, if I also store transaction owner's
with the transactions (probably a good idea), I could actually
recreate and watch the whole fight, blow by blow as it happened, along
with watching Bubba call in his Wiz friend to...).  Just roll the DB
back to that time, and replay.  It makes snooping a thing of the past.

Personal evaluation of the implementation: Not pretty.  Requires
intimate knowledge od the DB and server design.  Does work.  Blech.  

>I have 64-bit ObjIDs and they are generated by the RDB now
>(convenient and consistent, but some overhead on object creation).  I
>use a timestamp field in  the RDB, also automatic but it is not part
>of the "loaded" object.  It exists  solely in the RDB and is very
>efficient.

What does the timestamp give you?

>Class Versioning happens through SQL DDL.  Attributes that are
>removed  are removed from all instanced objects.  Attributes that are
>added are added to all objects as nulls.   Methods reside in the
>class along with class instance attributes.   (That ColdC vs "real
>OOP" thing we discussed  earlier ;-)  )  

If added attributes default to NULL, how do you propagate an attribute
value to all children/instances?  Similarly, how does this work for
methods?

>...Versioning can be expensive
>if done late in a class's life, but  this is part of interactive
>programming and not a runtime thing.

And the expense is due to the fact that you now have two or more
versions of the same base class, each with its own collection of
instances?

How do you handle the case where you want to propagate a change, say
an added/change method or attribute, to all current instances?  As I
undersand your current system making the change to the class definesa
new version of the class and only affects new instances of that class. 
Old instances continue to behave as versions of the old class
(pre-edit).

>I've been having a real bitch of a time with the DB recovery thing
>myself. This is distantly related to your transactional recovery, I
>think.   I have been trying to keep a log that contains TranID,
>ObjectImage which is interspersed with Tranid Commits finally Object
>Cache to Disk Commits  (ala DB2).  The theory is that if I pull the
>plug on the machine, upon reboot I  can read the log back to the last
>Object Cache to Disk Commits that are  encompass completed
>transactions (assuming the disk head doesn't take a big bite). 

This is pretty close to what I'm attempting (tho I had no idea that
DB2 did it too -- I just thunk it up one night).  My idea is to run a
seperate database, a simple ISAM pretty well, for the transaction log. 
Log entries would be of three types:

  Start of cache commit.
  Specification for a given transaction  
  ...(may be many of these)...
  End of cache commit

Where the "EndOfCacheCommit" is only written to the log upon
successful writing of all component transactions.  Then, for the DB in
recovery mode its just a question of of rolling the log back to the
last EndOfCacheCommit statement and cleaning the DB of any later-dated
changes.

The DB itself is also pretty standard: the old business of directory
and map blocks interspersed with data regions, with the headers for
each record placing themselves in a linked list of prior/later
versions of that object.

>Two problems are apparent.  The log buffer may not be completely
>flushed. This I can handle since I can rollback to the previous
>Object Cache to Disk Commit updating the RDB with the last valid
>ObjectImage.  The other problem is rather embarrassing.   My lovely
>OS decides that files open for write access at the time of crash are
>no longer viable.  There must be a way around this.  My trusty
>mainframe never made this decision.  I don't really want to keep
>closing and reopening a log file.  Perhaps I've missed a simple
>concept here?

Odds to dollars your OS is not quite this stupid (I guess we're
talking Win NT here, so it actually may be pretty likely).  Typically
the real reason for the problem is that the filesystem ends up in a
potentially inconsistant state due to data having been written to the
file without the directory entry being updated to reflect that (ie
data nodes in the file system are commited for the file, but other
entries in the filesystem (ie directory entry) invalidate that
assignment).  

Re-opening the file every IO and then closing it to keep everybody in
sync is pathetically expensive.  The standard solution is to run file
IO's thru a dup()'ed or dup2()'ed handle.  

  ie

  // Early in your code:

  errfile = open (whatever);

  void errprintf (whatever)
  {
    tempfile = dup (errfile);
    write (tempfilem, whatever);
    close (tempfile);
  }

This way the directory structures get updated for every file IO and
the filesystem can pretend to maintain itself in an internally
consistant state without having to reopen the file for every IO.  

--
J C Lawrence                           Internet: claw at null.net
(Contractor)                           Internet: coder at ibm.net
---------------(*)               Internet: clawrenc at cup.hp.com
...Honorary Member Clan McFUD -- Teamer's Avenging Monolith...




More information about the MUD-Dev mailing list