[MUD-Dev] Re: Bruce Sterling on Virtual Community goals

Chris Gray cg at ami-cg.GraySage.Edmonton.AB.CA
Tue Oct 20 23:08:42 New Zealand Daylight Time 1998


[Jon A. Lambert:]

 >I don't see why an interpreter would not look vitually identical to a 
 >VM that processes bytecode.  An interpreter would normally use 
 >lazy evaluation and so could the bytecode generator.  That is code 
 >generation or execution would occur at the same points in the 
 >execution of the interpreter/code generator.  Possibly just a 
 >flag could control what output is desired.  State information saved 
 >for interpretation (labels) would just as easily be used in 
 >generating bytecode.

Not sure what you are getting at with "lazy evaluation". Not compiling
to bytecode until the function/module is called? Labels? What for you
need labels?

 >Secondly a standard bytecode gives you a common target for
 >any desired language compiler.  C. Gray's horrendous case/esac 
 >(hehe) structures or my lovely if/endif repeat/until proclivities. 
 >;)

Fwap! (ala Gulliver's Travels)

 >Details:

Are these of the HLL or the bytecode? Things like control structures and
looping structures aren't needed in the byte code if you just have
PC relative branches like Java bytecode does. No labels are needed
unless you plan on having an assembler, which I think is a bad idea.
Going straight from parse trees to byte-code isn't that bad. My stuff
is under 1200 lines for that, and it includes several special cases
for optimization. Also, at the level of byte-code, symbols shouldn't
be needed. If you have to look things up in tables at runtime, then
you are again discarding much of the speed of bytecode. Locals and
parameters can be just offsets from a frame pointer for a stack machine.
You can keep the symbols for them around if you want to have symbolic
disassemblies, but stripping them out works fine too. Symbols are
needed when resolving newly loaded stuff, as you have discussed.

 >Primitive data types to be represented.
 >Complex data types (object).
 >Control structures.
 >Looping structures.
 >Subroutines/procedures/functions.
 >Label generation.
 >Symbolic storage.
 >Variable storage.
 >Type conversions/promotions.
 >Native routines call and return format/native symbol table.
 >Sub progam/module call and return protocol.
 >Exception handling/trapping
 >Arithmetic processor.
 >Booleans/conditionals.
 >Operator precedence.

 >Aye.  A linked list or array of pointers sounds good.  Would this 
 >imply no argument type-checking, or is this something better left to
 >the language compiler to enforce?

I think that needs to be a fairly early decision. If you are doing a
strongly-typed language (you would need escapes in a system like this
of course), then the byte-codes emitted by the compiler contain all
of the needed type information. Thus, a function call knows nothing
explicit, at run-time, about parameter types. It is a dynamic linkage
issue that types match up. This is the way Java does it, I believe.

What I do is this: a byte-code function, when stored in the database,
is just a sequence of bytes that the system knows how to put back
together into higher-level structures. Those structures include the
result type and the count and types of the parameters. Thus, the
whole thing is pretty much independent. The only extra stuff is the
references from the byte-code to other functions/modules.

If referenced functions/modules had specified types at compile time, then
the dynamic linking to newly loaded (or newly compiled) code should check
those things and fail the linkage if they don't match. If a reference
from existing byte-code to something new is an untyped call, then
the reference must contain the types of the actual parameters, and the
expected type of the result. When the actual call is made (as opposed
to when the code is loaded/linked), the checks must be made. You can
do those calls earlier as well, unless you allow for functions being
first-class objects, and specifically don't include the type info in
the type of the "function pointer". That's what I do, and that is my
escape from the strongly typed system.

Alternatively, if the system isn't strongly typed, then there is a
whole range of "weakly typed" designs that can be considered. Does the
bytecode have opcodes for, say, integer arithmetic? Or does it just have
an opcode for addition that will examine the types of its arguments
and do any needed conversions, etc.?

What I'm rambling about here is that I think this kind of decision
needs to be made fairly early on, for a given set of language/bytecode.
There is no reason the system can't have multiple choices of the above
at the same time, but then you have to carefully craft glue stuff to
make them talk to each other, and that requires intimate knowledge of
how they both do things, violating any information hiding principles.

 >A needless complication at this point, perhaps?

Jon was replying to something else, but it applies equally to what I've
just been saying!

--
Chris Gray     cg at ami-cg.GraySage.Edmonton.AB.CA




More information about the MUD-Dev mailing list