[MUD-Dev] From Devmud: Database module, draft 3

Greg Connor gconnor at nekodojo.org
Sat Jan 16 18:32:16 New Zealand Daylight Time 1999


As before, I'm cross-posting this to both Devmud and Mud-dev.  Devmud
readers, I am proposing that "generic database" might be useful to you when
completed, so your feedback is appreciated.  Muddev readers, you may or may
not be interested in tracking the progress of this project, but it is also
of general interest to folks who might be building a mud from scratch and
looking for tools to use as components.  Plus, many of you have done this
before, so your feedback is especially valuable to me.

Additional versions after this one will probably be more similar than
different, so I may not continue cross-posting to both places unless there
is continued interest.


Changes this time

Terminology:  The container that holds all the records of a given type is
now called a "table" (I had been calling this a "database").  This is to
translate better into SQL-like concepts and provide less confusion when
dealing with actual SQL (later).

db_table (formerly DB, the handle to the table) is a simple type for now,
but may be a struct type later, so it is changing from "return-this" to
"pass reference to where I should put it".  This also allows the db
functions to return db_result as well as a handle to the table.  You
allocate a local db_table instance (not a pointer, though you can pass the
pointer around) and pass by reference to db functions.  Though what's in
the block may be an integer or simple type, don't depend on it (don't
blind-cast it to long int or anything).

Fields and requests should be a little more well-defined this time.  

Most everything we pass back and forth will be a param block (actually,
pointer to one).

Everything defined here will start with "db_" to avoid conflicts with other
modules.  If we go strictly by module name, these would be "database_" but
db_ seems easier to read, and doesn't make the line go off the edge of the
screen when it appears three times on one line :) 


Introduction

As previously mentioned, I am working on the first incarnation of a
"database module".  This module will be based on gdbm, and will probably be
the typical "low end" (i.e. free) database solution.  

Along with the first database, I am working on the "generic database
interface" - the idea being that the database module is "generic" if it
uses this interface, and that means it can be exchanged for another
"generic" database later if the developer chooses.  I expect most
developers will use the generic interface.  (Some may choose to modify the
database further and create their own non-generic functions... which is
fine too; it just means the module can't be exchanged for a different one
later :)

With this in mind, I want the generic interface to meet *most* of the needs
of other module developers, but it may not meet everyone's needs out of the
box.  Still, I am interested in hearing about any features I may have
missed, so I can make a list, and either stick it in, or leave it for an
"enhanced interface" later.


Road map: data type overview

A collection of data is arranged in multiple "tables", which consist of
"records" (rows) and "fields" (columns).  You open a table before doing
anything with it, and close it when you're done.  

You get stuff from the database by placing values into a "request" block.
Requests contain instructions about which fields you are searching, and how
to sort the results, and when the request is complete, the same request
block will also contain results: how many records were selected, and
whatever info dbmodule needs to allow you to "walk" the list (probably just
an array of record-id's).  

Once you know the ID of a particular record you want (either by searching,
or by a reference from some other record), you fetch a copy of the record
by its ID.  (Note that "record" refers to an actual field in the database,
and also to the in-memory data structure you use to get info about that
record)  A "record" (the data structure) is an in-memory header that tells
how many fields there are, whether the values are loaded, and if they are,
contains pointers to the values and their length in bytes.  A "record"
structure can be your own private copy of the data, or the actual structure
used by dbmodule's internal cache, if you are permitted to read directly
from the cache.  

Records contain pointers to "fields" which are just blocks of data of a
known length, which your program is expected to manipulate.  Dbmodule
"knows" whether these fields are supposed to represent integers, floats,
strings, or some other foreign type.  If it is a simple type that dbmodule
knows how to deal with, and if you request dbmodule to "index" that field,
you get the extra ability of being able to search or sort based on those
fields.


Road map: functions and usage descriptions

Opening and closing tables

  db_result db_open_table ( db_table *mydbtable, char *mypath );

db_open_table opens a table for use, based on its pathname, populates the 
db_table block you pointed to, and returns a successful result.  
db_open_table should not be called for a database that is already open (this 
will probably return a "permission denied" error or something).

  db_result db_close_table ( db_table *mydbtable );

db_close_table closes the database (syncing and flushing as needed) and
frees all its resources, like its cache.

  db_result db_new_table ( db_table *mydbtable, char *mypath );

db_new_table is called like db_open_table , and creates a new (empty)
table in the specified file, and leaves it open (returning DB, same as
db_open_table).  You may not create it new if it already exists... you must
delete it first if you intend to replace it.  (Some implementations may not
allow you to create tables.  If so, there will be documentation about how
to do this manually for your dbms, as well as any other pointer or
reference files you need to make).

  db_result db_delete_table ( char *mypath )

db_delete_table removes the data and deletes any files that were created by
db_new_table or any other files that dbmodule created to store this set of
data (like index files, etc)  You may not delete it while it is open...
close it first.  (Some implementations may not allow you to delete tables.
If so, there will be documentation about how to do this in your dbms).  


Making requests

  db_result db_submit_request ( db_request* );

You build the db_request structure in your own space (whether it's malloc,
local, or static is up to you).  You fill in db_request fields for:
  How many fields are in the criteria
  For each field, what field number, what target data, and what
relationship your field has to the target (for example, "Field 0 less than
or equal to 5")
  How to sort the data
  How many results you want (at most)
  How many results to skip (useful if you are submitting the same request
again because you didn't get them all the first time)
  Where to put the results (and the size of the buffer you are providing).

After submitting the request, dbmodule fills in more fields in the request
block, to tell you:
  How many records match the criteria
  How many records were returned (in your buffer)
  The actual results
  Whether there are more results after this
If there are more records than you requested (or more than could fit in
your buffer, if less) then you can either submit the request again,
specifying to skip the number of records you have already traversed, or you
can submit again with a larger buffer and get "all" the results.  

(Skipping some and calling again to fetch the next chunk may be faster if
you're doing something interactive, and is especially useful if you know
you're not going to need the entire set of results.  However, if the data
changes in between submitting the first and the second time, you may skip
over some, or you may get the same results again.  If you need a reliable
traversal that visits each record exactly once, you should submit the
request with a large enough buffer to hold all the results.  If you want to
know the size of the buffer you should allocate, submit with a limit of 0
to just learn the number of matching records, then submit again with the
appropriate sized buffer).


Reading and writing data

There will be at least two different methods for getting the actual data... 

Method one: immediate fetch to shared memory (known as "direct")

This method returns a pointer to wherever the data lives in dbmodule's
private cache, and does not copy the data for you.  

This method is a little more time-efficient than giving callers their own
copy of the data, and should only be used if an entire project is
single-threaded and concurrency isn't going to be a big issue.  (Your
program may still "emulate" something that is threaded (such as allowing
execution of in-game scripts to be scheduled and backgrounded) as long as
each "single step" is a separate unit of work and cleans up after itself
before resigning control.)

To read a record:
  db_result db_direct_readable_record ( db_table*, db_id*, db_record** );
  void db_direct_release_record ( db_record* );

Even though this appropach is not very suitable for multi-threading, this
method still records a "read lock" on that record on behalf of your
function.  This is to protect your program from reading already-purged
memory; in cases where your function gets additional records, or calls a
sub-function that gets multiple records, which otherwise might cause
previous records to be purged. That is, records won't be purged from the
cache while you still have a pointer to them.  This safety measure also
keeps you from writing to a record if you have already fetched it for
reading in some other part of your code (such as higher in the calling
chain) in an attempt to preserve database integrity.  

Your function must release the lock when done (this must be done even if
your function bails out prematurely due to an error).  If you don't release
the lock, other calls will be able to read the record, but not write to it,
and if this happens to a lot of records, the cache will eventually become
full of locked records and (if there is a hard limit on cache size) reading
more records will start to fail.  (dbmodule will record the calling address
of functions doing the locking, and if debugging flags are set, can report
the record ID's and addresses of the functions which did not release their
locks at table close time)

(Note that these locks are not "real" locks at the database backing-store
level... they are designed to keep dbmodule and the calling program from
conflicting with itself.  They are not designed to protect data integrity
when more than one program opens the same database.  You should only use
method one on a database that will be opened by dbmodule exclusively; if
you're using a client/server database backing store like Sql, make sure
that no other clients have access to the database, and that you never run
multiple copies of your program at once.  Dbmodule may return an error if
you try to open a database already in use by another copy of your program,
on that same machine, in that same directory, but this is not guaranteed,
and dbmodule cannot detect copies of your program running in different
directories or on other machines.)

To write a record:
  db_result db_direct_writeable_record ( db_table*, db_id*, db_record** );
  void db_direct_release_record ( db_record* ); //yes, this is the same for
both

To write to a record, you register a write lock, and modify the data, then
ask for it to be written back, and the write lock released.  For strings or
other structures of varying length, you will either build the new string
elswhere in memory and call a dbmodule routine to have it copied into the
record, or you call a function requesting that the existing storage be
resized (and truncated or padded as needed).  This is needed because the
storage is allocated by dbmodule; if you substitute your own pointers,
dbmodule will try to free the wrong memory, resulting in either a crash or
leak.

You may open a record for reading from multiple calling functions, provided
it is not locked for writing, and you may open a record for writing from
one *single* function, provided it is not locked for reading or writing.

Method two: copy on read, multiple conditional write back

  long int db_get_entire_record_size ( db_table*, db_id* );
  db_result db_get_entire_record ( db_table*, db_id*, buffer*, long int
buffersize );

With method two, you get your own private copy of the data, so there is no
need to establish locks.  To get a record (including all its data), you
first learn the size of the data to expect with db_get_size, so you can
allocate it with malloc (or if using stack or static storage, check to make
sure your buffer is big enough).  You then request the record to be copied
into your storage, with db_get_record.  For safety, you tell db_get_record
how big your storage is when calling db_get_record, in case it changed
since you did malloc() or in case you are using a static buffer of a
certain size (if the buffer is larger than the data to be returned, that's
fine)

The buffer you get back will contain a db_record at the beginning, and the
data for the fields will be packed in the buffer after that.  Thus the
db_record structure at the beginning of your buffer will contain pointers
to elsewhere in the same buffer.  (This is a shortcut to get all the data
in two function calls.)

The alternative method is:
  long int db_get_record_header_size ( db_table*, db_id* );
  db_result db_get_record_header ( db_table*, db_id*, buffer*, long int
buffersize );
  long int db_get_field_size ( db_record*, int field_num );  
  db_result db_get_field ( db_record*, int field_num, buffer*, long int
buffersize );

This method allows you to just get some of the fields, if you want to save
on local memory and copying (though it will be more function calls to get
the data you need).  Depending on how many fields you have, this may be
more efficient; ie. it may be better to make two or three small
transactions, rather than one large transactions, if the sum of the small
transactions is still smaller than the one large transaction... this is
especially true for network data sources.

(There is no guarantee that the particular dbmodule implementation you're
using will break queries up into smaller queries... it may still load the
"whole record" into its cache as a chunk.  So, this may or may not be an
"optimization" depending on the implementation.  Usually if the backing
store is relational and network-accessible, dbmodule will only fetch the
fields you need, and if the storage is local and flat, like gdbm, records
will always get fetched to the cache completely.  In general, if you have a
lot of large fields and you need less than half of the data, you should get
them one field at a time.)

To write changes to a record, you take your copy of the record and modify
it, mark the record as "has been modified", and submit the modifications:
  db_result db_write_changed_record ( db_record* );
Dbmodule knows what the record ID is, and can tell whether someone else has
changed the record since the time you read it.  If the record has changed
in the database since your readable copy was made, the attempt to write the
new data will fail, and you will need to read the new copy, modify it, and
write it again.

The fields here are using your own storage, so be careful that you don't
write past the end of a field.  (If you read the record in to one single
buffer, you're probably better off just changing the internal pointers so
they point to new memory, alloc or local.  When you write your changes,
dbmodule obeys the pointers, and doesn't care whether they're still packed
into a contiguous structure.)

If you have changed multiple records, and you want to commit changes at the
same time, you build an array of pointers containing all the elements of
the transaction, and call: 
  db_result db_write_changed_records ( int num_records, db_record*[] );
Changes will only be written if ALL the changes in the block are guaranteed
to write, ie. if none of the records have changed since those copies of
them were checked out.  Similarly, if you read some records and are relying
on their values to stay the same when writing to other records, you can
include them in the block... they are not written, but the group of writes
is conditional on the entire set not having changed.  This is based on the
description of "lockless db semantics" as originally proposed by JC Lawrence.

Example:  You are processing a transaction like "move dragon D from room A
to room B".  You need to check the status of exit E to see if the way is
clear, then modify D's location, remove D from A's contents, and add D to
B's contents.  You are only changing A, B and D, but you could produce an
improper or inconsistent result if the value of E had changed since you
checked it.  So, you include all four records in your "conditional commit"
and only three of them are marked writable.  If a concurrent thread had
succeeded in locking the door E, and THAT change successfully commits, then
this would cause your transaction to be invalidated; your write fails, and
you start the transaction over again.


Creating and deleting data

  db_result db_create_empty_record ( db_table*, db_record** );

For creating a new record, you call "db_create_empty_record" - this will
allocate a db_record structure, and return its address.  The new structure
will contain a new unique db_id, and all other fields will be null.  To set
its data, modify the record and submit it for writing as you would normally.

If you are using the "direct" method, you will probably want to call
  db_result db_direct_create_record ( db_table*, db_id*, db_record** );
which creates a record AND sets a write lock on the record, so you don't
have to extract its db_id and request a second write-locked copy.  If
you're using conditional commit, ignore this.

To delete a record from the database, call
  db_result db_delete_record ( db_table*, db_id* );

(Still to do: we probably need to be able to create and delete as part of a
multiple-commit transaction, and we need to make sure that the correct
memory model is being used here for new record structures)


Adding fields and changing table structure

You can add "fields" to the table definition at any time; if you have
existing records, and add a new field, existing records will have a "null"
value for that field.  Fields are numbered starting with 0, and having more
fields increases the size of the record header.  Dbmodule remembers the
name and the type for each field, and will repeat the field definitions
back to you if requested, but most of the time you will just refer to
fields by their number: 0 through (number of fields - 1).

Deleting a field is not supported, since that would involve "renumbering"
fields.  You should just mark that field number "unused" and possibly write
null values to that field if you want to shrink the records.

(Functions in this section are not defined yet)
(Get number of fields, get field descriptions, get field types, change
field types, add a field)


Checkpoint/flush - Functions not identified yet

Cache reporting - Functions not identified yet


Type definitions

Some of the type definitions are up in the air, so use your imagination, or
make a suggestion :)

db_table - Some identifier that needs to be passed back to the module for
any other operations on the database (this may either be a pointer to a
structure that dbmodule maintains, or just an index number or other
"cookie" that dbmodule interprets)

db_result - Probably an enum, or could be an int.  Either way, there will
be some symbols provided in the header file to compare Result to, like
Success, or "pseudo-error" conditions like Duplicate, NotFound, or "real"
errors like FileNotFound.  Alternately, "Result" could be changed to
"errno" where 0 means "success" and other numbers mean various failure
conditions.

db_id - A value issued by the dbmodule when an item is first created.  It
is guaranteed to be unique among objects in that same database (at least
among all currently existing objects).  It is suitable for storing inside
another record if you want to build relationships between objects.  This
will probably be a 32-bit unsigned integer, and it could be something
generated by the dbmodule code, or something from the underlying database
back-end (like sql's "row id").  Items in the database may not necessarily
be in ascending or descending order, will not necessarily start at 0, and
some ID's may be skipped and not exist at all.  ID's are intended to be
compared "equal" or "not equal" but whether it is less-than or greater-than
doesn't imply anything about its age, nor does the difference tell you how
close the objects are in the db or in creation time.  (If these limitations
on "ID" are not suitable, client programs are free to create their own "ID
number" as a field within each record, which can be sorted, renumbered,
recycled, traversed in order, searched, etc.  ID is intended to be a handle
passed back and forth to the database module, but if you want to render
ID's in a human readable form at some point, you should consider making
your own field for this)

Path - A string pathname identifying a database file.  Path may not be the
name of an actual file... the dbmodule may use the path as-is, or add an
extension to it, as long as it is consistent.  It is recommended that the
pathname passed in not contain any extension, so the dbmodule can add its
own extension if needed.  (In fact, the dbmodule may open other files in
order to carry out its duties, and if so, they will use the same path with
a different extension).  If the path is not fully qualified, the file will
be looked for in the current working directory of the process.  Note: Some
database implementations may not store your data in a file, and may instead
communicate with a third-party package to store the data (possibly on
another computer on the network)... in these cases, the "path" may just be
an identifier string (like "ScoobyMud.Objects") or may be a file where
dbmodule stores some local instructions to itself... in any case, no other
module should monkey with the files created by dbmodule, and you should not
assume that an actual file will exist at the "Path" given.

Field - This is the actual "data", which can be a struct or union, or a
simple type like a string, or a block of binary data.  Dbmodule doesn't
care what the format of the data is, except in cases where you want to
"index" data based on a certain field (ie. search for or sort/traverse by
that field).  You also pass "Len" (length) along with a Buffer so the db
will know how big your structure or other data is - and dbmodule will hand
Len back to you when reading the data.  If a field is supposed to be sorted
or searched like a number, and it's too short or too long, dbmodule will
treat it as an invalid value and probably sort it to the end of the list.


Limitations

Database files are not interchangeable between different implementations,
and no conversion tools are provided.  "Generic" means you don't need to
rewrite large parts of your code to make it work with different back-end
engines... but it does NOT mean that your data will be preserved if you
change engines.  That is, it is generic to the developer and integrator,
and not to the end user :)

It is possible to write a converter, to read from one type of database and
write to another, but such a converter would need to know the format of the
data it is converting, mostly because the new database will probably have
new and unpredictable ID numbers.  Alternately, you may choose to write all
your data to a flat file, then read it back in after changing databases.  I
am not planning on doing either of these... for now it is assumed that you
will choose a particular db module before you start composing and storing
data you care about :)

The generic database is not "Relational" - meaning that if records contain
references to other records, either in the same database or another
database, Dbmodule doesn't manage these relationships for you.  For
example, you can store a field for "parent" in your structure, but Dbmodule
will not recognize it as a reference to another object.  If you delete the
parent object, the child may still think it has a parent, when it really
only has an invalid ID.  It is the client's responsibility to manage
relationships, and update related records as necessary.

Challenges...

First, gdbm has no concept of "sorting" or "searching"...  it only allows a
"traversal" of all the records, visiting each one once, but in no
particular order (based on gdbm's internal hash) So, this will need to be
done with separate gdbm files, or some other way.  If anyone has done this
kind of thing with dbm/gdbm I would be interested to hear about it.

Anyway, dbmodule will create and manage the exra files as needed, and as
far as the client is concerned, the "indexes" will all be part of the
single "table" (ie. even though there are multiple gdbm files, you don't
have to call open_database on each one... they are all opened/closed either
at the same time, or as needed.)  It is not yet clear to me how sorting or
searching will be implemented, but I think this is pretty core to the
concept of a "database" and if it is not provided, people will be forced to
implement their own which may not be portable.  

After gdbm, other third-party back-end engines will probably have built-in
ability to do this, so I would like to do it in such a way as to make it
easy and transparent for dbmodule-future to implement our standard
interface, and repackage the call to a SQL or other db pretty easily.  


Cache impl. notes

Table:
  Path, gdbm_info, Cache, 

Cache:
  ID_Btree, Locked record list, Unlocked record list

ID_Btree:  (b-tree ordered by Record_ID)
  Record...

Locked records: (linked list within Record)
  Record_ID, Record, LockRequests, next_item

LockRequests: (linked list)
  Request, next_lock, prev_item

Unlocked records: (linked list within Record)
  First item, Last item
  Last used timeticks, next_item, prev_item

Record:
  Record_ID, Field list

Field list: (array)
  Length, Field

Field:
  Data


GDBM implementation notes

How we store stuff in the db
db_data = ( field_count, field_header[field_count], field_data )
field_header = ( offset, length )
field_data = ( block )

How to unpack into cache

How to pack cache into db








More information about the MUD-Dev mailing list