Jeff Kesselman jeffk at tenetwork.com
Thu Apr 10 18:58:48 New Zealand Standard Time 1997

You left out a true virtual machine with a compiled assembly code i nits
own assembly langauge (typicly a stack machine of soem kind).

These VMs come in two varietoes these days, traditional stack machines liek
the old pascal P system, adn object orienetd stack machiens like the JAVA
machine (and  my Kelvin VM).


At 07:43 AM 4/10/97 MST, you wrote:
>:> Perhaps we can come up with a gradient (I'm sure this has been done
>:> and we can all point to where we are on it:
>:> 1.  native machine code
>:> 2.  threaded code
>:> 3.  bytecode
>:> 4.  parse tree traversal
>:> 5.  pre-tokenized interpretation
>:> 6.  straight text interpretation
>:Would anyone be able to give a short description of all of these?
>:(Especially threaded and bytecode) I have been trying to find out about
>:threads - i got the pthreads lib for linux, the docs for that are
>:impossible to understand without *some* kind of prior knowledge of what
>:threads are - I have heard that the linux port of it isnt very 
>:good/stable/efficient and martin keegan has gone so far as to advise not
>:to use them under any kind of unix..
>:Has anyone ever done an analysis comparing the above 6 methods? It would
>:be interesting to look at.
>Weeeell, since I opened this can of worms... First, note that I completely
>made up the above list - it is not official in any way! Second, the
>'threading' referred to has nothing to do with threads of execution. Sorry.
>Take *all* of this stuff with lots of grains of salt. I've only ever done
>4 and 6, myself. Oh, and a bit of 2 many many years ago.
>- native machine code: that produced by compilers like C, C++, assemblers
>    Stuff that runs directly on the CPU using native CPU instructions.
>- threaded code: the usual example here is the language Forth. There are
>    variants of this, but here is one. Stuff in memory is typically a
>    sequence of addresses, with a few non-addresses mixed in. The addresses
>    are pointers into the code of the support framework for the language.
>    The system will have a central core, consisting of half a dozen or
>    less instructions, that just reads the next pointer in the sequence,
>    and branches to it. That sequence will do its thing, perhaps reading
>    a word of data from the sequence, then branching back to the central
>    core to allow the next operation.  E.g.
>	if a < 0 then
>	    Print("negative\n");
>	else
>	    Print("positive\n");
>	fi;
>    could be represented as:
>	@pushvar	primitive: push value onto stack
>	A		value: address of variable who's value to push
>	@pushconst	primitive: push constant value onto stack
>	0		value: the constant 0
>	@<		primitive: pop 2 args, compare, push 1 if '<' else 0
>	@if		primitive: the 'if' handler
>	@label1 	value: address of 'else' part
>	@pushconst
>	'negative'	value: the address of the string constant
>	@printstring	primitive: the thing that prints a string
>	@branch 	primitive: branch to the indicated place
>	@label2 	value: the address to branch to
>label1: @pushconst
>	'positive'	value: the address of the string constant
>	@printstring
>    So, you have to sort of 'compile' to this stuff, but it is a lot
>    easier than compiling to true native code. You don't care about the
>    details of instruction formats and stuff, you don't have to worry about
>    linking, and you don't have to worry about object file formats.
>    Other forms of threaded code have actual instruction sequences instead
>    of the above type of sequence. Subroutine call instructions are used to
>    get to the primitives, and they can use the return address to know
>    where their operands are.
>- bytecode: this is conceptually easier, but still has lots of variations
>    possible. You compile to something like the above, but just use
>    (typically) single-byte codes to represent which primitive you want
>    to use. E.g. for our example:
>	00: <numeric code for PUSHVAR>
>	01: <some pointer to 'a'
>	05: <numeric code for PUSHCONST>
>	06: 0 (4 bytes)
>	10: <numeric code for LESSTHAN>
>	11: <numeric code for IF>
>	12: 9 (2 bytes)
>	14: <numeric code for PUSHCONST>
>	15: <pointer to string>
>	19: <numeric code for PRINTSTRING>
>	20: <numeric code for BRANCH>
>	21: 6 (2 bytes)
>	23: <numeric code for PUSHCONST>
>	24: <pointer to string>
>	28: <numeric code for PRINTSTRING>
>	29:
>    The 'interpreter' of these bytecodes can be written in assembler, or
>    in some higher level language like C. Again, you sort of have to
>    compile to this stuff, but its fairly easy compilation.
>- parse tree traversal: (I use this). This is just a bunch of malloc'ed
>    memory, containing records of a union type, that have a type code
>    saying what they are (like the bytecode codes), plus pointers to
>    further records, or some constant values. Rough ASCII art follows:
>			------------------------
>			| IF |	   |	 |     |
>			------------------------
>			       /      |       \
>			      /       |        \
>			     /	      | 	\
>		    ---------	 -----------   -----------
>		    |<|  |  |	 |PSTR| ptr|   |PSTR| ptr|
>		    ---------	 -----------   -----------
>		      /   |
>		     /	  |
>	     -------- ---------
>	     |VAR| a| |CONST|0|
>	     -------- ---------
>    Just some kind of direct 'parse tree' of the program. The two parts
>    of the 'if' would often be nodes representing a sequence of other
>    nodes, terminated by one with a nil pointer (linked list of things
>    to do). The interpreter just does a recursive preorder (do the left
>    most node first) traversal of the tree, executing as it goes. For
>    an 'if', it evaluates the condition, and decides which of the two
>    subparts of the 'if' to execute. The overhead here is all of the
>    extra recursive calls of the interpreting routine. This structure
>    is fairly easy to work with, however, such as being 'pretty printed'
>    back out to ASCII form.
>- pre-tokenized interpretation: the program has been turned into a
>    sequence of codes and constants, like in bytecodes, but it hasn't
>    necessarily been fully checked for consistency. So, the interpreter
>    must continually check that the next codes make sense. Also, the
>    names of variables, etc. probably haven't been looked up yet (they
>    are just stored as ASCII strings), so the interpreter has to go
>    find what they refer to (if they are in fact valid!)
>- straight text: as above, but no pre-tokenization into chunks has been
>    done. All of the work of scanning the input is done over and over
>    again as the code is re-used. Quite slow, but doesn't require the
>    design of any extra data structures, magic codes, etc.
>Whew! I'm now 15 minutes late for my bus - hopefully I'll catch the next one!
>Chris Gray   cg at ami-cg.GraySage.Edmonton.AB.CA

More information about the MUD-Dev mailing list