[MUD-Dev] Re: Why did it take years?

Cynbe ru Taren cynbe at muq.org
Tue Oct 27 19:21:44 New Zealand Daylight Time 1998

[ This started as a personal note, then I decided it has enough
  mud-dev content to be posted.  Skip halfway down if you prefer.]

Adam Wiggins <adam at angel.com> notes:

|    Well, I would imagine any study like that is going to deal largely
|    with 9-to-5 data punchers.  I know that I personally spit out several
|    hundred lines of code a day at work, plus a few hundred more at night
|    at home.  Of course this doesn't count bug-hunting, which can often
|    result in 0 new lines of code in a day.

Heh, more than I usually manage. :)

For me, at least, it depends a lot on the task.  I once did 20,000
lines in 2-3 days by dint of writing about 900 lines of Perl to
autogenerate them.  (Brute-forcing the Marching Cubes algorithm for
extracting a surface from volume MRI data by compiling special-case
functions for all cases.  Result ran limited only by memory

Mostly, I seem to spend my time researching APIs and reading existing
code figuring out how to modify it and such.  Maybe maintaining large
programs isn't conducive to the sort of code productivity one gets at
the start of a development project...

|    Wow!  I'll be very interested to see what you do with this.

I'll add you to my muq at muq.org mailing alias.  Traffic is about 3-12
emails/year, as I announce new releases, shouldn't overload you if
you can handle mud-dev. :)

|    Bah!  Just get one of them beowulf rackmounts they are selling now.
|    Each system has 8 666mhz Alpha processors, and you can fit 6 into an 80"
|    rack.  Get a couple racks and you should be in descent shape.
|    Or Corel has their Netwinder beowulfs (comminicating by SCSI, nicely
|    enough) which have space for up to 40 processors - and they are
|    hotswappable.  Get a few of those and you'll be fine. :)

*laugh*!  I've been vaguely following beowulf (they were the only
source for 3c590 (?) ethernet cards when I first bought them), but
haven't seen the systems you mention.

http://lwn.net/980507/a/avalon.html says that the Avalon folx found
rackmount not cost-effective.

For now, I'm living on lentils scraping up the last of our down
payment (closing in a week -- http://sandystone.com/house.html
for a pic of our glorious view and rundown new dump :) so the
old dual-CPU PentiumPro box I built for Muq will have to do for
now.  RAM's cheap enough that I put 256Meg in it, which should
avoid displaying the current inefficiency of diskbased operation
too obviously at first...

|    I didn't look at any readmes, I guess I must be a quick read.
|    Actually, it was quite easy for me, as their layout
|    (except for their naming schemes) is very similar to my own
|    style.  In fact, their source tree looked eerily familiar...

Ah!  All totally new stuff to me (I do 3D scienfic visualization
but not Quake &kin, as yet), and I couldn't get oriented in the
time I had available.

|    Impressive.  How many from-scratch rewrites did you do, might I ask?
|    (I might add that six years ago I was struggling to learn C...)

Trying to make me feel old?  I learned C on the side while writing
the first Loglan/Lojban parsers, back in '81.  :)

Well, I wrote > 100K of design studies and docs before I started, by
way of orientation, then built it bottom-up.  System doesn't look like
any of the design study stuff any more, but the basic layout has
survived pretty intact.  In particular, I wrote the diskbase module
first, and its API and implementation have survived just fine,
although I have tons of comments on the top on enhancements I'd like
to do.

So I can't say I've done any from-scratch rewrites.

But many individual bits of functionality have seen one or more
complete rewrites.

In particular, the softcode compiler got rewritten several times, for
various reasons:

 *  I always wanted a softcoded compiler, so compiles could timeslice
    with other activity:  This meant a C-coded bootstrap compiler and
    then a complete second production compiler.

 *  I realized part way through that by moving some of the compiler
    functionality into an 'assembler' (loosely named) which guarantees
    to produce valid executables that won't crash the bytecode
    interpreter, the rest of the compiler could consist of untrusted
    user-level code, opening the way to end-user written compilers for
    new syntaxes.  I always love moving creative options from the
    wizards to the end-users, so I considered this well worth rewriting
    most of the compiler internals.

 *  I switched from tinyMuck to Scheme to CommonLisp as my design reference,
    over time.  (Partly because it became apparent the fuzzball maintainers
    would just as soon I stayed out of their sandbox:  An example of something
    not apparent during the initial design phase.)  This changed the underlying
    object representation, which occasioned a fairly major rewrite of the

I've also rewritten the message-stream code several times.  (If you think
of Muq as a prototype of what an object-oriented Unix would look like,
then Muq message-streams correspond to Unix pipes, one diffence being
that message-streams can pass arbitrary types of values instead of
just bytes.)
  The message-stream rewrites were simply a symptom of my not having
seen anything similar before, hence learning the practical implications
and opportunities by experimentation as I went along.  Message streams
are also the primary interface between threads belonging to different
users, in the Muq+Micronesia architecture, so various efficiency and
security concerns tend to center on them.

 *  Issue:  Who owns strings that pass through these streams?
    -> If the sender owns them, the reciever can hoard them and exhaust
       the sender's space quota.  Not nice.
    -> If the reciever owns them, the sender can use up the reciever's
       space quota via various tricks.  Not nice.
    -> In the end, sending most text as blocks of immediate character
       values rather than as strings seems a good compromise:  It is
       then up to the reciever whether to block them into string objects.

 *  Issue: How does one delimit logical blocks of information?  Newlines
    aren't a very satisfactory answer in this context.
       While mindful that Unix does very nicely with unstructured array-
    of-bytes files and streams, I decided that having message streams
    be aware of logical record boundaries was a worthwhile addition.

 *  Issue: How does one handle the scanning of lexical tokens during
    the compile phase of a softcoded softcode compiler?
    -> One wants to read the tokens as a stream.
    -> One wants this to be reasonably efficient -- C code for the
       lowest-level character hacking.
    -> One wants to be able to switch easily from running a user shell
       to compiling and executing code directly.  (I don't think explicit
       edit-compile-eval cycles have any place in the end-user interface.)
    In the end, I built lexing capabilities into the message-streams
    as a mode.  This isn't very pretty, but requiring the usershell thread
    to switch between two different stream types every time it changes
    modes, flushing any queued input from one into the other each time,
    is even uglier.  (Re-inventing SysV streams might be an alternative?)

 *  Issue:  If text is written into the message stream as blocks of
    characters, does it have to be read the same way?  This is often
    a pain, given that the two operations are in different threads
    controlled by different users.
      I wound up with support for multiple reading modes, so you can
    in essence read either entire blocks of values (natural when it is
    a set of related property-value pairs representing something much
    like a subroutine call argument set, say) or else read one value
    at a time, ignoring record boundaries (often natural when
    processing text char-by-char).

  * Issue: If one is to have any pretense of sanity and security, the
    reader of a message-stream must have a reliable way of knowing
    who wrote the information.
      I wound up having message-streams maintain a separate internal
    per-block-of-information slot recording the identity of the user
    who wrote it.  Some message-read functions ignore this, others
    return it.
      Maintaining this field is -much- cheaper than (say) using public
    key signature methods on every inter-user intra-server interaction.

  * Issue:  Sooner or later, one winds up wanting two-way streams, for
    bidirectional communication between two threads in a nice tidy
    package that can be conveniently passed around.  They need to have
    two ends, so that one end can be passed to each of the participants.
    Does this have to be a special type?
      This time, I found a reasonably pretty solution:
      -> My vanilla unidirectional streams now have a 'twin' field
         which defaults to NULL.
      -> To create a bidirectional stream, just create two vanilla
         streams, and then link them via their 'twin' fields.
      -> I rehacked all my 'read' prims to check the 'twin' field,
         and if not NULL, read from the twin instead of the field
      No new classes.
      The two separately addressable ends come for free.
      No new read/write commands.
      Very little extra code in the existing read/write prims.

  * Issue:  Sooner or later, one winds up with a thread wanting to read
    from several message streams, which in implementation terms, means
    waiting in several job queues.
    -> Example: Doing a blocking read from a message stream with a timeout
       reduces to waiting in both a clock queue and the message stream queue.
    -> Example: The Micronesia userdaemons wind up accepting requests from
       both other user daemons, and also the usershell thread.  The simple,
       secure solution is to use one message-stream for requests from our
       user, and other for requests from untrusted folks.
    But we do -not- want to encourage novice programmers to write softcode
    spinwaits like:
	  if (q1.hasInput())  readAndProcess(q1);
	  if (q2.hasInput())  readAndProcess(q2);
    CPU time isn't -that- cheap. :)
    In fact, I'd rather it not even be -possible- to write that in softcode.
      So we wind up re-inventing the select() call, which allows a thread to
    wait on several different message-streams at the same time.
      I hadn't anticipated that need, so I wound up doing a fairly major
    rewrite on job queues to support it.

In general, my conclusion is that the lots-of-design-effort advocates
are undoubtedly right that fixing a design error early on is -much-
cheaper than fixing it midway through implementation.

But only the naive can believe that sufficient design effort can
ELIMINATE the need for design fixes and changes while implementation
is in progress, on any but the most trivial of projects: The world
keeps changing and experience reveals things hard to predict.  The
waterfall model of development remains a convenient idealization
hiding the necessary complexity of real development projects.

|    > But one needn't aim -impossibly- high [...]
|    Very very very true.  All the things you're saying remind me very much
|    of monologues I've given to my coding teams at the start of projects.
|    Here's what I usually like to do at the start of a big project:
|    Make a list of things that are really, really, really important - things
|    the project just won't work at all without.  Those should be done
|    first.
|    Make a second list of things that are really, really important - things
|    that without which, the project will suck.  These will be done next,
|    although some of them may remain undone with the product ships.
|    Finally, make a third list of things which are really important - things
|    that would add a whole lot to the project, and which would be really cool.
|    You will never do any of these things.


I think anyone who has been through a major software development team
effort will recognize the truth of that!


More information about the MUD-Dev mailing list