[MUD-Dev] Questions about ... XML as data format

Kwon Ekstrom justice at softhome.net
Mon May 20 10:24:39 New Zealand Standard Time 2002

From: "Adam" <ya_hoo_com at yahoo.com>
> Kwon Ekstrom <justice at softhome.net> wrote:
>> Other than time required, I'd recommend XML.

> I'd very very strongly recommend avoiding straightforward XML
> under Java as a method for storing server data, for ANY
> application where there's likely to be any more than trivial
> amoutns of data.

XML is a good solution, and there are possibilities out there.  I
recently took a look at DOM4j benchmarks and it shows slower
read/write performance to Apache Xerces although it has extremely
faster node access (which is where I've found most of my slowdown).

There are a variety of solutions even light-weight XML parsers that
don't support all the specs (do you need DTD's, validation, or
schema?) which run with acceptable speed.

> E.g., using reasonably up to date version of Zerces (the
> IBM-opensourced XML/java parser), time taken to load a 2mb XML
> file is heading towards ten minutes on low-end pentiums.

Using a mid-range computer, I've noticed about 110ms to load up my
socials file, I'm not sure of what the size was, but I imported the
socials from ResortMUD and had 593 socials...  For a text mud that's
an awful lot of socials.  I'll probably store alot of my system data
as XML, and areas as well.  My machine is only 450mhz, about what
you'd expect a text mud-server to be.

> Bear in mind that to store lots of small pieces of data (e.g. the
> hitpoints for each of many monsters) imposes massive storage
> overhead in XML (one byte of data takes often 100 to store, since
> you have to have an open tag, a close tag, and will often choose
> to have additional attributes too). Note that this is, of course,
> highlt compressible. Note too, that to make a small change to the

If you output the data as a number, yes.  You could output it as a
byte array, although that'd lower your effectiveness for using XML.
The tools available for use with XML are extensive.

> file saved on disk essentially requires rewriting the whole file
> (XML is not a clever file format designed to make partial
> rewriting to a file easy). Som you soon end up automatically
> splitting your XML file into many small files, and compressing
> each one, and then having buffers to cache reads/writes, and ...

If you design your system to handle these from the start it's quite
powerful.  Personally I don't see why you wouldn't want to split
your data into multiple files.  Areas each in their own file, your
players in their own files.  System data in a file, commands in a
file, etc.

> So. You're much better off IMHO going with a database of some sort
> to start off with. This is really just an application of an old
> maxim that any app which generates/uses/modifies any significant
> amount of data should ALWAYS use a database - in case it grows too
> large for basic emulation of database features using savefiles.

Databases are a wonderful way to store databases, they're
specifically written to handle that.  You can use a hybrid system as
well, such as storing your data in a database in XML format.  You
then gain the ability of incremental updates, and database indexed
searching along with the object oriented structure of XML.

You'd still have to parse the data when you pull it from the
database, and you'd still have to output in XML format.  This would
require you to instantiate and object (or pull an object from a pool
and apply the xml to it as a template... which I recommend if you're
tossing objects out regularly)

This setup would allow you to maintain the majority of your object
heirarchy in a database and only loading it into memory when the
data is needed.  It gives you the random-access that you mentioned
XML lacking, and an optimized search pattern that relational
databases are well known for with all of the benefits of XML.

A variety of these methods are used quite a bit in web development.
Where you're pulling nontrivial data quite regularly.  I doubt you'd
consider the Microsoft website a trivial amount of data.  It's
written in XML/XSL although I think they use the XML features of
their database.

XML is a flat-text format designed for use with other tools.  Oh
yeah, another scheme would be an xml import/export with an SQL
native format so you could use standard XML tools with your data,
you'd just have to export it, use the tools and import it, a painful
3 step procedure.

> The only two exceptions I can think of are:

I can think of quite a few more, but won't get into that now.  I
will say that I've used XML to store much larger amounts of data
than 50 players or 25 mobs will require without a problem.

>  3. You foolishly forget that prototypes become full products
>  against best intentions, and use XML in the prototype, intending
>  to replace it with proper DB access routines before using the
>  system - and then never get time to take them out, until it gets
>  really difficult to remove them, because now your whole DB is
>  stored as weakly structured XML.

> I did number 3. Whoops.

Sounds like you made a mistake and allowed it to take a tool from
your toolbox.  XML is not the end-all be-all, in fact it's a rather
poor solution in many respects... in just as many areas it excells
above all other data storage mechanisms.  It simply depends on what
you want to do.

Another little something you'll discover is that not all XML parsers
are equal in all areas.  You may have to modify your programming
style a little bit to get the best performance.  Some parsers will
give you the best performance doing single node selects, you'd have
to know exactly what you want beforehand.  Other ones do well by
allowing you to select all the nodes and using an Iterator to sort
thru them.

I've done alot of research on java xml parsers (although hardly
definitive) and I have a few issues with how some things are done.
I could think of places which are used ALOT that should be optimized
more.  But that's a different thread entirely.

-- Kwon J. Ekstrom

MUD-Dev mailing list
MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list