six degrees of submission ... er, compilation.
Cynbe ru Taren
cynbe at laurel.actlab.utexas.edu
Thu Apr 10 01:32:58 New Zealand Standard Time 1997
greg at uni-corn.demon.co.uk
| On Mon, 7 Apr 1997, Chris Gray wrote:
| > Perhaps we can come up with a gradient (I'm sure this has been done before!),
| > and we can all point to where we are on it:
| > 1. native machine code
| > 2. threaded code
| > 3. bytecode
| > 4. parse tree traversal
| > 5. pre-tokenized interpretation
| > 6. straight text interpretation
| Would anyone be able to give a short description of all of these?
| (Especially threaded and bytecode)
I'll take a shot a that, if you wish. Basic question is how much
work you do once in a prepass through the code, and how much you
do as you actually execute it. The Dragon Book (Aho Sethi Ullman,
Principles of Compiler Design? -- my library is in storage) is
still the bible of the field as far as I know, if you really
wanna get into it.
(6) Straight text interpretation.
No prepass, just execute one long string of ascii mangled
over and over until done. Usually called a macro language
these days: m4, cpp, arguably csh &tc would be the contemporary
examples, I think. I believe this is essentially how tinyMUSH
does stuff, although I could be wrong.
(5) Pre-tokenized interpretation.
Do the lexical scan first, finding all the word boundaries and
probably replacing them by fixed-size integer/pointer values.
Early basics used to do something like this, I'm not sure what
a good contempory example would be.
(4) Parse tree traversal.
Do both the lexical scan for identifiers, and the grammatical
scan for statement structure (usually via LALR(1) parsers these
days, but I still like recursive descent :) in the prepass.
Lisp interpreters of the classical sort are an outstanding
example of this genre: They resolve the code to a List and
then interpret the list. Xlisp is an accessible example of
this sort of implementation, available free. (I wrote the
internals doc on it, few years, back... :)
Code is lexed, parsed and then compiled into something that
looks like native code, but for a fictional machine, usually
one picked for simplicity or such. If you then have a portable
program implementing the virtual machine, you have a very portable
system: This is how Pascal originally got established -- once the
demand was in place, native code compilers came later. Java appears
to be the same movie on fast forward :).
(2) Threaded code.
Much the same, except the virtual machine has been -really-
simplified. In the classic forth implementations that originated
the term, bytecodes were replaced by addresses of prim functions
implementing operations, so that the whole "virtual machine"
reduced essentially to
for (;;) (*pc++)();
-- just call the functions in the vector of pointers in order.
(Conditionals &tc can be done via prims which side-effect the
global pc variable.) By simply adding a 'call' opcode to the
front of each pointer, you can turn this kind of threaded code
into actual machine code, which happens to do way too many
function calls to be really efficient :). Worry about that and
tweak for efficiency and soon you're at
(1) Native machine code -- code optimized for the native instruction
set of the beast you're on. C is the canonical contemporary
example, but of course just about everyone plays this game these
days -- Java with JustInTime compilers, for example.
BTW, Perl does quite nice bytecode compilation, in my understanding,
a previous comment on this list notwithstanding.
| I have been trying to find out about
| threads - i got the pthreads lib for linux, the docs for that are
| impossible to understand without *some* kind of prior knowledge of what
| threads are - I have heard that the linux port of it isnt very
| good/stable/efficient and martin keegan has gone so far as to advise not
| to use them under any kind of unix..
Others are prolly more expert than I :). Threads are processes without
address spaces: Lots of program counters running around in a single
address space. See MacOS :). If you want multi-tasked and "efficiency"
more than anything else, you'll prolly wind up here. I'm assidously
avoiding this, I have enough design problems without opening this can
of worms :).
At the least, pthreads (POSIX Threads) are somewhat beta these days: I
have no doubt they will eventually be a standard part of the C programming
toolkit, once an infinite number of libraries have been appropriately
| Also, last week I declared all my code as junk. I'm now considering
| writing my mud in some kind of interpreted language, but efficiency and
| speed is a concern (is there anything else i should also be concerned
I think it is widely agreed that early on, we all tend to worry too
much about efficiency and too little about almost everything else :).
First, note that "native code" may be actually slower if the code is
executed only once and has no loops. This is the norm in some sorts
of applications, like user shells: This may be why csh doesn't compile
to native code. :)
Second, note that going to "native" code may not make your program
any faster. There are interpreted languages out there that run faster
than native code equivalents. Key question is how much time your
program spends in interpreter overhead, and how much doing useful
code. If it's spending 99.99% of its time outside the inner
interpreter, going to native code is going to buy you a max 0.001%
speedup :). Native code will win only if you're spending almost all
your time in virtual instructions so short that the inner interpreter
overhead dominates. This is likely to be true if you are doing mostly
integer operations, for example, but it is likely to be false if you
are doing mostly string operations. (Unless you really bozo the
inner interpreter implementation, of course!)
Third, note that speeding your program up may not make any practical
difference if your application is mostly I/O bound, say -- waiting for
user input or disk I/O or net I/O or such. I'd think most current
muds would most of the time fall into this category.
Fourth, I'd list as being potentially more important than efficiency a
whole range of things:
Speed of implementation. (A native code compiler may take longer to write.)
Reliability. Users prefer systems which stay up long enough to be useful. :)
Security. An interpreter may be able to check privs more consistently.
Portability: Might want to run the same code on multiple architectures,
possibly dynamically passing functions on the fly between
machines via the network.
Position independence: You might wanna swap code in small chunks off
disk on the fly, demand-paged, and find that this is difficult
in some native-code architectures.
Compactness: Bytecodes are usually significantly more compact than RISC
code: If the bottleneck in your app is downloading the code
via modem, say, you may wind up running faster overally using
bytecodes than native code.
Tool reuse: Everything else being equal, there's obviously a significant pull
these days to leveraging all the effort going into Java, say,
rather than re-inventing the virtual machine, bytecode compiler,
just-in-time compiler, GUI builder &tc &tc on your own -- and
then documenting it all :). Java happens to architecturally
forbid some of the things I want to do, and be ill-adapted to
others, and came along after I got started, so ... *sigh*.
| Im not 'definite' about going interpreted tho, although it seems
| highly likely that im about to embark on my sixth complete rewrite :-/
| Has anyone ever done an analysis comparing the above 6 methods? It would
| be interesting to look at.
I think I've implemented all of them at one time or another, and that
the most one can say is that engineering involves lots of tradeoffs,
and you have to consider things carefully one project at a time :).
More information about the MUD-Dev