[MUD-Dev] Storing tokens with flex & bison

cg at ami-cg.GraySage.Edmonton.AB.CA cg at ami-cg.GraySage.Edmonton.AB.CA
Tue Jan 18 20:20:12 New Zealand Daylight Time 2000

[Jon A. Lambert:]

{My, we're certainly getting into some low level technical stuff here.
Those not interested, please skip all this! Marian, there will be a
test on the details tomorrow! :-)}

> I'm familiar with the Z-80, though I'm not sure how closely related it was to the
> 8080.  I know the Z-80 had over 400 instructions (if you count the modes), a
> much much larger set than the 6502.

Z-80 is a superset of the 8080. CP/M stuff (like all my Draco system)
targetted the 8080 stuff only, for maximum compatibility.

> In particular, I recall the 'RST n'  opcode, which is pretty much the way I
> implement calls to built-in functions/library calls in my bytecode.  It allows
> me to add new built-in functions without recompiling old bytecode, as long
> as I don't change 'n' which is the index into a definition table.  BTW, this
> technique might have been useful in the DevMUD project for dynamically
> loading and registering new function libraries (DLLs).

Nod. I've done that when I needed the database to be stable, and didn't
want the key values for the builtins changing. A new major release lets
me clean 'em up again, however.

> Good point.  I could probably remove my check for underflow!  I'm dubious
> about removing the stack overflow check since I allow infinite recursion,
> theoretically at least.  My VM will timeout and cancel task long before any
> ridiculously huge stack size is reached.

What's your stack size? Mine is currently 8K. That allows 2048 local
variables, parameters and return addresses - I figure it'll do. Doesn't
take too long to hit that with deep recursion. Recursive Towers of Hanoi
has no trouble, however.

> I'm not familiar with how the gcc dynamic goto stuff works.  I've included
> my Borland output near the end of this post.  Usually I do include locals
> at the very top of routines, for personal readability.  The only time I embed
> locals in block scope is with the C++ for-loop counters and those that have
> a high construction cost (classes).   I know that it is not necessarily optimal.

Me too on the locals. However, I was aiming for maximum speed here, so was
happy to relax my style to get it. Here's some snippets of my main byte-
code interpreter, to show the gcc dynamic goto stuff, along with the
alternative for non-gcc compilers.

static void
bcRun(QValue_t *pRes)
    BYTE_T *pc;
    ULONG_T *sp, *fp;
#ifdef GCC
    static void *(codeTable[]) = {
	&&label_bc_error,			    <== funny gcc syntax
#define CASE(x)	label_ ## x:
#define BREAK	goto *(codeTable[*pc++])	    <== funny gcc stuff
    BcCode_t op;	/* make it enum to try to avoid switch range check */
#define CASE(x) case x:
#define BREAK	break
    [snip FETCH_AND_INC... macros - they fetch to 'ui', 'ul']

    pc = PC;
    sp = SP;
    fp = FP;
#ifdef GCC
    while (B_TRUE) {
	op = *pc++ + bc_error;
	switch (op) {
	    PC = pc;
	    runError("error 1 in byte-code execution");
	CASE(bc_ret) {
	    ULONG_T ul;
	    UINT_T ui;

	    sp = (ULONG_T *) ((BYTE_T *) sp + ui);  /* pop locals */
	    ul = *sp++;				/* pop return addr */
	    fp = (ULONG_T *) *sp++;		/* pop caller frame pointer */
	    sp = (ULONG_T *) ((BYTE_T *) sp + ui);  /* pop parameters */
	    pc = (BYTE_T *) ul;			/* return */
	    FP = fp;
	    if (sp == StackTop || BcAbort) {
		/* Have returned from an outer call - return from bcRun */
		PC = pc;

> >I say go for the special-case ones. I have some like 'pshz', 'psh1', etc.
> >They are very easy to do and use, and can have a noticeable speedup. I'm
> >real tempted to go add 'add1' and 'sub1', as in our earlier discussion,
> >to see what effect they have. Should take about 15 minutes.
> I tend to agree.  I would note that my weak-typeing of variables prevents
> my use of specialized bytecodes except in the case of constants.  :(


I did the add1/sub1 experiment. It would have taken less than 10 minutes
if I hadn't left out a couple of 'break's in a code-generation switch. :-(

Unfortunately, there was no noticeable change in run speed with or without
them. However, with this version of gcc/egcs and RedHat Linux, the tests
all run slower than on the previous version (same machine). Grrrrrr!

> Sure, but there are other considerations beyond a small(?) performance
> gain that make it "sensible".   Design goals for instance.  Like providing a
> common target for multiple languages.  Implementing and debugging a new
> parser is half the cost of a new interpreter, since the execution piece is
> reusable.  Multi-tasking for another.  It's a good deal easier to interrupt a
> running VM externally and save its state than an executing Interpreter.
> Although I must admit using threads might simplify it.
> Or perhaps the idea of safely unwinding the native stack in an interpreter in
> order to flag an error and get back to square one just frightens me, and I find
> it easier to debug something after it has been be decomposed into the
> simplest of functions.  For some reason I associate interpreters with
> monolithic code.

Sure, I'll buy some of those. The nature of my parse-tree interpreter is
such that those weren't issues with it, however. I just have a file-static
variable that says when to get out of any loops, and let the recursive
interpretation just dribble its way back to the top. 'interpreter.c' in
my ToyMUD sources is sort-of like that - it uses the return value in the
recursive calls to say whether or not to quit. As for multiple input
languages - well, I might have to add one or two new parse-tree node
types, but that would be about it.

> Ok, I'll pull back the covers.  And your guess is pretty accurate.  ;)

Do I win a cookie? :-)

> TMVar operator+(const TMVar& v1,const TMVar& v2) throw(Error)
> {
>   if ((v1.type != v2.type) && (v1.type != LIST)) {
>     throw (E_TYPE);
>   }
>   if ((v1.type == ARRAY) || (v1.type == ERROR) || (v1.type == OBJECT)) {
>     throw (E_TYPE);
>   }
>   TMVar ret(v1.type);

As I've mentioned, my C++ is lacking, but I think I'm OK with this. It's
semantically the same as

    TMVar ret = v1.type;

but without any possibility of a copy constructor call?

>   switch (v1.type) {
>     case NUM:
>        *((int*)ret.value) = *((int*)v1.value) + *((int*)v2.value);
>        break;
>     case LIST:
>        new(ret.v) TMList(*(TMList*)v1.value);

Here you've got me, however, and looking at the disassembly didn't help.
I grok the TMList construction and the use of 'new' to create a new object,
but what's the combination do?

> And heres the native code.   I have register optimizations on, but I turned
> off constructor inlines so it could be followed.  Pardon the line breaks.

OK. Ouch, those exceptions are expensive!

[much snippage]
>       c:\tychomud\tmvar.cpp, 484: return ret;
>       00403B37 66C745D85C00             mov   word ptr [ebp-0x28],0x005c
>       00403B3D 8D55F0                   lea   edx,[ebp-0x10]
>       00403B40 52                       push  edx
>       00403B41 FF7508                   push  [ebp+0x08]
>       00403B44 E871F1FFFF               call  TMVar::TMVar(const TMVar &)
>       00403B49 83C408                   add   esp,0x08
>       00403B4C 8B4508                   mov   eax,[ebp+0x08]
>       00403B4F 66C745D86800             mov   word ptr [ebp-0x28],0x0068
>       00403B55 50                       push  eax
>       00403B56 6A02                     push  0x02
>       00403B58 8D55F0                   lea   edx,[ebp-0x10]
>       00403B5B 52                       push  edx
>       00403B5C E896F2FFFF               call  TMVar::~TMVar()
>       00403B61 83C408                   add   esp,0x08
>       00403B64 58                       pop   eax
>       00403B65 66C745D85C00             mov   word ptr [ebp-0x28],0x005c
>       00403B6B FF45E4                   inc   [ebp-0x1c]
>       00403B6E 8B55C8                   mov   edx,[ebp-0x38]
>       00403B71 64891500000000           mov   fs:[0x00000000],edx
>       c:\tychomud\tmvar.cpp, 485: }

I think this bit is what I was getting at with my worry about temporaries
with reference variables. But again, I'm far from an expert. This pair
of constructor/destructor calls on the 'return' seems to be undesireable.
If you rewrote the operator routines to take a 3rd reference parameter
that is where to store the result, would not this extra stuff go away?
This isn't a reference case here, but perhaps something similar.

> Passing by reference is the same as passing by pointer as far as the
> machine code that is generated by the compiler.  Of course, as a general rule,
> I only use const reference parameters for safety.

Well, from a non C++ expert, my understanding is that it is fairly easy
to accidentally pass a non-lvalue to a routine that requires a reference
parameter. The compiler will silently proceed to create the needed lvalue
by creating a new temporary value of the required type, and doing an
assigment of the expression to the new temporary. It then passes the
address of the new temporary to the routine. This is a bug if in this
particular call, a value stored into that reference parameter matters.
It is a performance hit because of possible constructor/destructor calls
on the temporary variable, in addition to the value copy.

Just for geek fun, here is the code gcc generated for my addition case.
This includes the final 'goto' thingy.

	CASE(bc_add) {
	    ULONG_T ul;

	    ul = *sp++;
0xc72 <bcRun+1298>:	movl   (%esi),%eax
0xc74 <bcRun+1300>:	addl   $0x4,%esi
	    *sp += ul;
0xc77 <bcRun+1303>:	addl   %eax,(%esi)
0xc79 <bcRun+1305>:	movzbl (%edi),%eax
0xc7c <bcRun+1308>:	incl   %edi
0xc7d <bcRun+1309>:	jmp    *0x0(,%eax,4)

(This is an unlinked .o file - there would normally be a non-zero offset
in that last 'jmp' instruction.)

Don't design inefficiency in - it'll happen in the implementation.

Chris Gray     cg at ami-cg.GraySage.Edmonton.AB.CA

MUD-Dev maillist  -  MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list