Chris Gray cg at ami-cg.GraySage.Edmonton.AB.CA
Thu Apr 10 07:43:48 New Zealand Standard Time 1997

:> Perhaps we can come up with a gradient (I'm sure this has been done before!)
:> and we can all point to where we are on it:
:> 1.  native machine code
:> 2.  threaded code
:> 3.  bytecode
:> 4.  parse tree traversal
:> 5.  pre-tokenized interpretation
:> 6.  straight text interpretation
:Would anyone be able to give a short description of all of these?
:(Especially threaded and bytecode) I have been trying to find out about
:threads - i got the pthreads lib for linux, the docs for that are
:impossible to understand without *some* kind of prior knowledge of what
:threads are - I have heard that the linux port of it isnt very 
:good/stable/efficient and martin keegan has gone so far as to advise not
:to use them under any kind of unix..


:Has anyone ever done an analysis comparing the above 6 methods? It would
:be interesting to look at.

Weeeell, since I opened this can of worms... First, note that I completely
made up the above list - it is not official in any way! Second, the
'threading' referred to has nothing to do with threads of execution. Sorry.
Take *all* of this stuff with lots of grains of salt. I've only ever done
4 and 6, myself. Oh, and a bit of 2 many many years ago.

- native machine code: that produced by compilers like C, C++, assemblers
    Stuff that runs directly on the CPU using native CPU instructions.

- threaded code: the usual example here is the language Forth. There are
    variants of this, but here is one. Stuff in memory is typically a
    sequence of addresses, with a few non-addresses mixed in. The addresses
    are pointers into the code of the support framework for the language.
    The system will have a central core, consisting of half a dozen or
    less instructions, that just reads the next pointer in the sequence,
    and branches to it. That sequence will do its thing, perhaps reading
    a word of data from the sequence, then branching back to the central
    core to allow the next operation.  E.g.

	if a < 0 then

    could be represented as:

	@pushvar	primitive: push value onto stack
	A		value: address of variable who's value to push
	@pushconst	primitive: push constant value onto stack
	0		value: the constant 0
	@<		primitive: pop 2 args, compare, push 1 if '<' else 0
	@if		primitive: the 'if' handler
	@label1 	value: address of 'else' part
	'negative'	value: the address of the string constant
	@printstring	primitive: the thing that prints a string
	@branch 	primitive: branch to the indicated place
	@label2 	value: the address to branch to
label1: @pushconst
	'positive'	value: the address of the string constant

    So, you have to sort of 'compile' to this stuff, but it is a lot
    easier than compiling to true native code. You don't care about the
    details of instruction formats and stuff, you don't have to worry about
    linking, and you don't have to worry about object file formats.

    Other forms of threaded code have actual instruction sequences instead
    of the above type of sequence. Subroutine call instructions are used to
    get to the primitives, and they can use the return address to know
    where their operands are.

- bytecode: this is conceptually easier, but still has lots of variations
    possible. You compile to something like the above, but just use
    (typically) single-byte codes to represent which primitive you want
    to use. E.g. for our example:

	00: <numeric code for PUSHVAR>
	01: <some pointer to 'a'
	05: <numeric code for PUSHCONST>
	06: 0 (4 bytes)
	10: <numeric code for LESSTHAN>
	11: <numeric code for IF>
	12: 9 (2 bytes)
	14: <numeric code for PUSHCONST>
	15: <pointer to string>
	19: <numeric code for PRINTSTRING>
	20: <numeric code for BRANCH>
	21: 6 (2 bytes)
	23: <numeric code for PUSHCONST>
	24: <pointer to string>
	28: <numeric code for PRINTSTRING>

    The 'interpreter' of these bytecodes can be written in assembler, or
    in some higher level language like C. Again, you sort of have to
    compile to this stuff, but its fairly easy compilation.

- parse tree traversal: (I use this). This is just a bunch of malloc'ed
    memory, containing records of a union type, that have a type code
    saying what they are (like the bytecode codes), plus pointers to
    further records, or some constant values. Rough ASCII art follows:

			| IF |	   |	 |     |
			       /      |       \
			      /       |        \
			     /	      | 	\
		    ---------	 -----------   -----------
		    |<|  |  |	 |PSTR| ptr|   |PSTR| ptr|
		    ---------	 -----------   -----------
		      /   |
		     /	  |
	     -------- ---------
	     |VAR| a| |CONST|0|
	     -------- ---------

    Just some kind of direct 'parse tree' of the program. The two parts
    of the 'if' would often be nodes representing a sequence of other
    nodes, terminated by one with a nil pointer (linked list of things
    to do). The interpreter just does a recursive preorder (do the left
    most node first) traversal of the tree, executing as it goes. For
    an 'if', it evaluates the condition, and decides which of the two
    subparts of the 'if' to execute. The overhead here is all of the
    extra recursive calls of the interpreting routine. This structure
    is fairly easy to work with, however, such as being 'pretty printed'
    back out to ASCII form.

- pre-tokenized interpretation: the program has been turned into a
    sequence of codes and constants, like in bytecodes, but it hasn't
    necessarily been fully checked for consistency. So, the interpreter
    must continually check that the next codes make sense. Also, the
    names of variables, etc. probably haven't been looked up yet (they
    are just stored as ASCII strings), so the interpreter has to go
    find what they refer to (if they are in fact valid!)

- straight text: as above, but no pre-tokenization into chunks has been
    done. All of the work of scanning the input is done over and over
    again as the code is re-used. Quite slow, but doesn't require the
    design of any extra data structures, magic codes, etc.

Whew! I'm now 15 minutes late for my bus - hopefully I'll catch the next one!

Chris Gray   cg at ami-cg.GraySage.Edmonton.AB.CA

More information about the MUD-Dev mailing list