cg at ami-cg.GraySage.Edmonton.AB.CA
Thu Apr 10 07:43:48 New Zealand Standard Time 1997
:> Perhaps we can come up with a gradient (I'm sure this has been done before!)
:> and we can all point to where we are on it:
:> 1. native machine code
:> 2. threaded code
:> 3. bytecode
:> 4. parse tree traversal
:> 5. pre-tokenized interpretation
:> 6. straight text interpretation
:Would anyone be able to give a short description of all of these?
:(Especially threaded and bytecode) I have been trying to find out about
:threads - i got the pthreads lib for linux, the docs for that are
:impossible to understand without *some* kind of prior knowledge of what
:threads are - I have heard that the linux port of it isnt very
:good/stable/efficient and martin keegan has gone so far as to advise not
:to use them under any kind of unix..
:Has anyone ever done an analysis comparing the above 6 methods? It would
:be interesting to look at.
Weeeell, since I opened this can of worms... First, note that I completely
made up the above list - it is not official in any way! Second, the
'threading' referred to has nothing to do with threads of execution. Sorry.
Take *all* of this stuff with lots of grains of salt. I've only ever done
4 and 6, myself. Oh, and a bit of 2 many many years ago.
- native machine code: that produced by compilers like C, C++, assemblers
Stuff that runs directly on the CPU using native CPU instructions.
- threaded code: the usual example here is the language Forth. There are
variants of this, but here is one. Stuff in memory is typically a
sequence of addresses, with a few non-addresses mixed in. The addresses
are pointers into the code of the support framework for the language.
The system will have a central core, consisting of half a dozen or
less instructions, that just reads the next pointer in the sequence,
and branches to it. That sequence will do its thing, perhaps reading
a word of data from the sequence, then branching back to the central
core to allow the next operation. E.g.
if a < 0 then
could be represented as:
@pushvar primitive: push value onto stack
A value: address of variable who's value to push
@pushconst primitive: push constant value onto stack
0 value: the constant 0
@< primitive: pop 2 args, compare, push 1 if '<' else 0
@if primitive: the 'if' handler
@label1 value: address of 'else' part
'negative' value: the address of the string constant
@printstring primitive: the thing that prints a string
@branch primitive: branch to the indicated place
@label2 value: the address to branch to
'positive' value: the address of the string constant
So, you have to sort of 'compile' to this stuff, but it is a lot
easier than compiling to true native code. You don't care about the
details of instruction formats and stuff, you don't have to worry about
linking, and you don't have to worry about object file formats.
Other forms of threaded code have actual instruction sequences instead
of the above type of sequence. Subroutine call instructions are used to
get to the primitives, and they can use the return address to know
where their operands are.
- bytecode: this is conceptually easier, but still has lots of variations
possible. You compile to something like the above, but just use
(typically) single-byte codes to represent which primitive you want
to use. E.g. for our example:
00: <numeric code for PUSHVAR>
01: <some pointer to 'a'
05: <numeric code for PUSHCONST>
06: 0 (4 bytes)
10: <numeric code for LESSTHAN>
11: <numeric code for IF>
12: 9 (2 bytes)
14: <numeric code for PUSHCONST>
15: <pointer to string>
19: <numeric code for PRINTSTRING>
20: <numeric code for BRANCH>
21: 6 (2 bytes)
23: <numeric code for PUSHCONST>
24: <pointer to string>
28: <numeric code for PRINTSTRING>
The 'interpreter' of these bytecodes can be written in assembler, or
in some higher level language like C. Again, you sort of have to
compile to this stuff, but its fairly easy compilation.
- parse tree traversal: (I use this). This is just a bunch of malloc'ed
memory, containing records of a union type, that have a type code
saying what they are (like the bytecode codes), plus pointers to
further records, or some constant values. Rough ASCII art follows:
| IF | | | |
/ | \
/ | \
/ | \
--------- ----------- -----------
|<| | | |PSTR| ptr| |PSTR| ptr|
--------- ----------- -----------
|VAR| a| |CONST|0|
Just some kind of direct 'parse tree' of the program. The two parts
of the 'if' would often be nodes representing a sequence of other
nodes, terminated by one with a nil pointer (linked list of things
to do). The interpreter just does a recursive preorder (do the left
most node first) traversal of the tree, executing as it goes. For
an 'if', it evaluates the condition, and decides which of the two
subparts of the 'if' to execute. The overhead here is all of the
extra recursive calls of the interpreting routine. This structure
is fairly easy to work with, however, such as being 'pretty printed'
back out to ASCII form.
- pre-tokenized interpretation: the program has been turned into a
sequence of codes and constants, like in bytecodes, but it hasn't
necessarily been fully checked for consistency. So, the interpreter
must continually check that the next codes make sense. Also, the
names of variables, etc. probably haven't been looked up yet (they
are just stored as ASCII strings), so the interpreter has to go
find what they refer to (if they are in fact valid!)
- straight text: as above, but no pre-tokenization into chunks has been
done. All of the work of scanning the input is done over and over
again as the code is re-used. Quite slow, but doesn't require the
design of any extra data structures, magic codes, etc.
Whew! I'm now 15 minutes late for my bus - hopefully I'll catch the next one!
Chris Gray cg at ami-cg.GraySage.Edmonton.AB.CA
More information about the MUD-Dev