[MUD-Dev] Garbage Collection

Miroslav Silovic silovic at zesoi.fer.hr
Thu Dec 9 13:31:37 New Zealand Daylight Time 1999


Cynbe ru Taren <cynbe at muq.org> writes:

> In Muq, I'm currently using a plain-jane mark-and-sweep monolithic
> gc I hacked together one weekend just to have something rather than
> nothing.

Hmm. :) Problem with mark&sweep is bad realtime performance. You can
look to 1-2 seconds pause per 10MB allocated. Not too bad unless you
have *lots* of active objects or are running realtime combat. See
BattleTech MUSHes for realtime combat example (and the ammount of
annoyance 1 second lag can cause).

> I've been contemplating going to a two-generation system and/or
> implementing Dijkstra's three-color incremental algorithm.

Generational GC tends to require the same infrastructure as tricolor
(i.e. write barrier).

> Is there prior mud art with other approaches?  Are there working
> systems or good references I should be boning up on before proceeding?

No MUD systems that I know of, but there are plenty of programming
languages implementations that use good GC systems. Note that GC has
one serious drawback: it tends to touch all the pages it scans. While
it's possible to check whether the page is on swap (and then try and
be clever about scanning the pointers from it), GCs in general don't
do this, and so GC systems don't perform well when the machine begins
to thrash.

My favorite GC implementations are Hans Boehm's (major problem: uses
VM tricks to handle generationality), rscheme's (check
www.rscheme.org) and the one in TOM (incremental, nongenerational for
now, and with very clean interface to the object system that allows
objects a good degree of control over the way they get collected).

So here are the MUD/GC issues:

 - Using well-implemented GC results in a code that doesn't have to
worry at all about memory bugs - you probably want conservative stack
scanning if you're doing *any* embedded C, but you can almost always
scan heap exactly. Also, you can use circular datastructures without
limits, work with graphs in a general manner (refcounting tends to
break with general graphs) and keep all the memory issues separate
from your object system (i.e. no worries where/when
constructors/destructors get called). You also have symmetry between
objects pointed from stack and the objects pointed from heap - this is
VERY helpful for design. Note that GC may treat stack and heap
pointers differently, the symmetry is in the GC interface.

 - Write barrier... This is problematic part. In C, calling write
barrier for heap objects means you have to do

	GC_ASSIGN_POINTER(object->foo, another_object);

or

	object->foo = another_object;
	GC_LINK(object, another object);

(you need to know inter-generational pointers in gengc, that's what
the second parameter is for).

rather than just

	object->foo = another_object;

This is easy to forget. C++ with some templates has trivial workaround
(just implement write barrier into assignment operator) but will tend
to call write barrier WAY too often. Alternatively one could probably
program a lint-like utility to statically check for this problem.

 - swap interaction. This topic, to put it plainly, sucks. One way to
treat this problem is to treat parts of your object tree as separate
systems and collect them using distributed GC technique - it handles
one hard problem by turning it into another hard (but more researched)
problem.

 - one-bit refcounting: Some datastructures are VERY transient in
nature - strings, for instance, or closures when you use them just to
avoid enumerators. To handle them, you give them one bit refcount: You
know when these will disappear from stack, so you can use a single-bit
flag that checks whether this object is referenced from heap. If no,
once it disappears from the local scope, you can kill it. This doesn't
suffer from threading interactions typical for refcounting, and can
drastically increase performance if you allocate your strings on a
GC'ed heap.

 - distributed gc: This becomes relevant if you're running your mud on
several servers. I haven't thought much about it, apart from noting
that TOM supports somewhat limited form of it (read: VERY
fault-intolerant).

More information on this topic: ftp://ftp.cs.utexas.edu/pub/garbage/


--
How to eff the ineffable?



_______________________________________________
MUD-Dev maillist  -  MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev



More information about the MUD-Dev mailing list