Bottom of Chapter Contents Previous Chapter Next Chapter

4
: A Comparable Code-Based Interpreter

The second step to comparing the efficiency of tree-based and instruction-based interpreters, having built a tree-based interpreter, is to create the instruction-based interpreter. This chapter explores the design choices taken in the augmenting the pJTI to create the Picojava Hybrid Interpreter (pJHI), so that it can interpret program trees which are composed of instructions as well as those composed of statements.

The design and implementation choices made here are all made with a view to providing an instruction-based interpreter which is as similar as possible to the existing statement-based interpreter. By doing this, it can be subsequently argued that any comparisons made between the two interpreters reflect only the differences arising from the structural differences of the two approaches, and not the underlying implementation detail.

4.1 Design Requirements

An instruction-based interpreter is still tree-based in thes sense that it requires a representation of the program tree. The abstract grammar of the Picojava assembly language, discussed alongside the Picojava source language in Chapter 2 and included in its entirety in Appendix B, gives the structure of this tree. The difference comes only at the level of Block:

In the statement interpreter, the Blocks are composed of a tree of Statements.
In the instruction interpreter, the Blocks are composed of a linear list of Instructions (called code).

As well as the program tree we also need some mechanism for keeping track of the dynamic state of a program. Instead of seeking to emulate the implementation choices with a real instruction-based Java VM, we choose to use a symbol table which mimics as closely as possible that used by the pJTI.

The differences will come as there will now be different requirements for interpreter state, the third and final ingredient of the interpreter. The design will now need to incorporate a program counter and an explicit operand stack, as instructions will need to interact with both these entities. In the statement interpreter, the program counter was successfully hidden in the implementation of Block, and the operand stack was implicit from the invocation stack of statements in the statement tree.

4.1.1 Symbol Table Requirements

While the Java VM does not have a readily identifiable symbol table, at least not for variables^[7], in order to fairly compare the statement interpreter and instruction interpreter, this element of implementation is retained as it is described in §3.1.1.

4.1.2 Interpreter State Requirements

The Java VM keeps a set of method invocation frames in order to keep track of the stack of methods which are currently executing, with the most recently invoked method being on top of the stack. Indeed, any instruction interpreter will need to keep track of method invocations, and to make it comparable to the pJTI, block invocations will also be required. There will be instruction invocations, but these are much less imporant than statement invocations, as they will not be directly nested in the same way as statement invocations must be.

In the Java VM, there is no such thing as a statement, and each method invocation frame has associated with it a program counter which indicates the current instruction executing in that method and an operand stack which keeps track of computations within each method. These will need to be included in an explicit way in an instruction interpreter, as they are inherent to the behaviour of such an interpreter, but they will be associated with Blocks and not Methods. This is in line with the way in which the pJTI has been designed.

4.1.3 Program Tree Requirements

In the Java VM, at the level of a method, the tree becomes a linearized structure where instructions are bytecoded. The interpreter suggested here does not necessarily use bytecodes, but it does use a linearized structure at the level of Block to represent Instructions.

Like a statement-based program tree, an instruction-based program tree is a structure which at its root has a single CompilationUnit node. Each node is in turn composed of further nodes, as given by the rules Picojava assembly language grammar of Appendix B.

4.2 Design Choices

4.2.1 Symbol Table Design

The symbol table will be a class, and will act primarily as a dictionary matching qualified identifiers with variables (of primitive types or class instances), classes and methods. This is as for the pJTI, and the design is more fully discussed in §3.2.1.

4.2.2 Interpreter State Design

Whereas the stack of statement invocations could be used within the pJTI to record the current progress of a computation, the same is not possible in an instruction interpreter. An explicit operand stack is needed in order to record the results from previous instructions so that they may be used by subsequent instructions; i.e. instructions are not composed further instructions in the same way as statements can be composed of further statements.

Therefore, in line with the way we augmented the symbol table designed in §3.2 with interpreter state, we will further augment the symbol table with operations for dealing with an explicit operand stacks:

One operand stack will be associated with each Block of Instructions. Therefore, operations for creating new empty operand stacks for newly invoked blocks and restoring existing operand stacks when those blocks terminate are required.
Operands will need to be placed on and removed from operand stacks, so operations will be required to perform these functions.

We also augment the symbol table with operations for maintaining a program counter. This will need to be altered by branch instructions in order to effect flow control, so it is not sufficient to include it in the implementation of the Blocks themselves (as was the case in the pJTI).

4.2.3 Program Tree Design

The following classes will be used to construct an instruction-based program tree:

The class Identifier will be instantiated once for each unique identifier used within a program. The implementation and use will not differ from the pJTI.
The class QualifiedIdentifier will be instantiated wherever appropriate according to the grammar in Appendix A. The implementation and use will not differ from the pJTI.
The subclasses of Algebraic, when instantiated, will store values of primitive type. Their implementation and use will not differ from the pJTI.
The subclasses of Instruction will represent the instructions composing a Block. As was the case for statements, an instruction will be constructed, executed via its run method, and be forced to output a concrete form of itself using its list method. Execution and listing implementations are fully contained within the definition of each single instruction, which is not the case for statements.
The subclasses of Declaration will represent variable declarations in the program tree. Their implementation and use will not differ from the pJTI.
The class Block will be used to build methods only. Like Instructions, a Block will have a run method which executes it (creating all the declared variables and executing instructions one at a time) and a list method that outputs a concrete representation, including all Declarations and Instructions.
The class Parameter will represent formal parameter specifications as used in methods. The implementation and use will not differ from the pJTI.
The class Method will be instantiated for each method and constructor appearing in the program tree. There will be slight differences in the implementation of methods, but they will be used in the same way as in the pJTI.
The class ClassBased_Class will define classes in the Picojava program tree. Its implementation and use will not differ from the pJTI.
The class CompilationUnit will collect together the ClassBased_Classes needed to form a complete program tree. The implementation and use will not differ from the pJTI.

4.3 Implementation Overview

Where functions of existing classes from the pJTI need to be augmented, the new classes will be defined as subclasses of the existing classes, prefixed with the word Code. It is assumed that such subclassing would not be naïvely implemented by any Java VM used to run any comparison, and as a result the use of subclasses carries no overhead.^[8]

4.3.1 Symbol Table Implementation

The new methods required in the implementation of CodeSymbolTable to provide for an operand stack are:

Create a new operand stack.
Place values on the current operand stack.
Examine the top value on the current operand stack.
Remove values from the current operand stack.
Restore a previous operand stack (if any).

To maintain a program counter, it is necessary to add methods for:

Resetting the program counter to 0.
Incrementing the program counter.
Reading the current value of the program counter.
Changing the program counter, under the direction of a branch instruction.

4.3.2 Interpreter State Implementation

The implementation of interpreter state is shared between the CodeSymbolTable class and the Java language itself. Some interpreter state is implicit in the nested method invocations and associated frames of variables within the Java environment, while the remainder is explicitly given, either in the variables used by executing methods, or within explicit repositories provided by the CodeSymbolTable.

4.3.3 Program Tree Implementation

All but the terminal components of an instruction-based program tree have a list method. This list method takes one parameter, the CodeSymbolTable which is associated with that program tree. The CodeSymbolTable contains all the information required to produce a concrete listing for a program tree: all that is necessary is to call the list method for the root CompilationUnit object of that tree with the appropriate CodeSymbolTable object as parameter.

Some components of the program tree are executable, and therefore also have a run method. Again, the run method only takes one parameter, the CodeSymbolTable. The CodeSymbolTable is passed from method to method as the program tree is traversed, and is dynamically updated with program state information, particularly with values of variables, the values on operand stacks and the value of the current program counter.

A run method in the instruction interpreter may have a return type, but it is not appropriate for any member of the Instruction class to have a return type for its run method.

The executable components of an instruction-based program tree (i.e. those with a run method) are:

CompilationUnit: the main class is instantiated, the first constructor is called, and the main method is executed.
Method: the actual parameters are removed from the operand stack of the caller, they are assigned to the newly created formal parameter variables, and the block is executed. The result value assigned within the block is returned as required.
Block: the declared local variables are created and assigned default values before the program counter is reset, a fresh operand stack is created, and the instructions are executed one at a time.
Instruction_Action_Branch: changes the program counter to the specified index so that the next instruction executed is not the next one in sequence.
Instruction_Action_Branch_ConditionalTrue: the top value on the operand stack is removed, and if it is true, the program counter is changed to the specified index. See Branch.
Instruction_Action_Branch_ConditionalFalse: the top value on the operand stack is removed, and if it is false, the program counter is changed to the specified index. See Branch.
Instruction_Action_Evaluate_Constant: the associated data value is placed on the operand stack.
Instruction_Action_Evaluate_UnaryOp: the unary operation defined by the subclass is performed upon the top operand stack value. The top value on the stack is replaced by the result thereby calculated.
Instruction_Action_Evaluate_BinaryOp: the top two elements of the operand stack are removed, and the binary operation defined by the subclass is performed, the return value being placed back on the stack. The value is returned to the enclosing statement as required.
Instruction_Action_Load: the referenced variable is looked up in the symbol table and its value is placed on the operand stack.
Instruction_Action_Store: the referenced variable is looked up in the symbol table and its value is replaced by the top value on the operand stack, which is then removed.
Instruction_Action_Evaluate_Method: executes the specific method, which will remove its parameters (if any) from the operand stack. If the method returns a value, this value is placed on the operand stack.
Instruction_Action_Create_Object: the class specified is instantiated and the indicated constructor method is called, which will remove its parameters (if any) from the operand stack. The newly created object is placed on the operand stack.

4.4 Implementation Details

4.4.1 Implementation Of Identifiers

Identifiers and QualifiedIdentifiers are implemented in the pJHI as they are implemented in the pJTI.

Identifiers for all but class and method references are eliminated in the Java VM, but in order to ensure that the two interpretive approaches can be structurally compared, we need to ensure that they are dealing with the same basic entities. For the instruction interpreter to deal with pointers while the chosen implementation for the statement interpreter is forced to look up identifiers in tables would be unacceptable. See §4.4.3 below for a further explanation of this.

4.4.2 Implementation Of Types And Values

The AlgebraicOrStructured class and its subclasses are implemented in the pJHI as they are implemented in the pJTI. This is because the data types in both the statement and instruction purposes need to be the same in order to write equivalent programs.

4.4.3 Implementation Of Symbol Dictionaries

There class dictionary and variable dictionary are implemented in the pJHI as they are implemented in the pJTI.

While the Java VM would use a class dictionary also, the function of a variable dictionary is eliminated through the use of pointers within instructions to reference a specific location in the current method invocation frame. For instance, Java's equivalent of the Instruction_Action_Load instruction looks like:

iload #3

This instruction means "load the integer from index three in the current method invocation frame". There is no reason why a more sophisticated version of the pJTI couldn't use a similar scheme, but for the purposes of the current comparison, it is sufficient to ensure that both interpreters use the same dictionary implementations. For example, one can imagine a statement like "reference the value associated with location three in the current block invocation frame".

4.4.4 Implementation Of Symbol Stacks

The pJHI uses symbol stacks in the same way as the pJTI, and the implementation is also the same.

Strictly speaking, because the instruction intepreter evaluates actual parameters before the time of method call instead of during the time of method call, IcyStacks (§3.4.4) are not needed. However, it is convenient to use the same implementation of symbol stacks in the instruction interpreter, and it removes any possible advantage which might be gained by a different implementation.

4.4.5 Implementation Of The Return Variable

The return variable is used in the same way and implemented in the same way in both the pJTI and pJHI.

4.4.6 Implementation Of Concrete Listing Generation

Once again, there is no change between the use and implementation of concrete listing generation between the pJTI and pJHI. Of course, the list methods list statements and instructions in different ways, but the mechanisms for that listing are the same.

4.4.7 Implementation Of Operand Stack And Program Counter

As well as the methods provided for convenience (§3.4.7), which are implemented in the same way in both interpreters, the pJHI has additional methods for managing the explicit operand stack and program counter:

push(AlgebraicOrStructured): places a new data value on the current operand stack.
pop(): returns the AlgebraicOrStructured value on top of the current operand stack, removing it from that stack.
peek(): returns the AlgebraicOrStructured value on top of the current operand stack, preserving that stack.
pushStack(): creates a new operand stack and makes it the current stack.
popStack(): restores the previous operand stack.

The stack of operand stacks, and the operand stacks themselves are both implemented using instances of java.util.Stack.

4.4.8 Implementation Of The Compilation Unit

At the level of CompilationUnit, there is no functional change from the implementation in the pJTI to that in the pJHI.

4.4.9 Implementation Of Classes

Again, there is no functional change in Classes from the implementation in the pJTI to that in the pJHI.

4.4.10 Implementation Of Methods

CodeMethods are constructed using CodeMethod(Identifier, CodeBlock) while constructors are constructed using CodeMethod(CodeBlock). Parameters are then added using addParameter(Parameter), with a Parameter having no identifier being identified as the return parameter.

setParameter(SymbolTable, int, AlgebraicOrStructured): sets the values of the indexed formal parameter prior to a call to the run method. Unlike Methods, CodeMethods do not have to evaluate the parameters at this point- this is done in the code prior to the call.
run(SymbolTable): the block composing the method is executed, and the AlgebraicOrStructured value assigned to the return parameter (if any) by the block is returned to the caller.
list(SymbolTable): lists the method in a syntax similar to that of a Java disassembler.

As for Methods, the Parameters of a method are stored in an instance of java.util.Vector.

4.4.11 Implementation Of Blocks

CodeBlocks are constructed using Block(). Declarations are then added using addDeclaration(Declaration). Instructions are added using addInstruction (Instruction).

run(SymbolTable): ascribes new meanings to the declared identifiers, defining them as local variables with appropriate default values. A new operand stack is created using pushStack(). The instructions composing the block are then executed one at a time. Finally, popStack() is used to restore the previous operand stack (if any).
list(SymbolTable): draws a brace and indents the listing. A call is then made to getNumberDeclarations() to determine how many declarations are to be listed, and then these are listed. getNumberInstructions() is then used to determine how many instructions are to be listed, and these too are listed, with indexes down the left hand-side of the block.

As was the case for Blocks, the Declarations and Instructions of a block are stored in separate instances of java.util.Vector.

4.4.12 Implementation Of Flow Control Instructions

The construction of a Instruction_Action_Branch, Instruction_Action_Branch_ ConditionalTrue or Instruction_Action_Branch_ConditionalFalse requires only a single parameter indexing the instruction in the current block which is to be branched to.

The implementation of the run method requires only that the condition (if any) be tested, and the branch be taken if the condition has the required value. A straight Instruction_Action_Branch has no condition to test- it branches uncoditionally.

As an example of implementation, consider Instruction_Action_Branch_ ConditionalTrue:

class Instruction_Branch_ConditionalTrue extends

    Instruction_Branch
{

  public Instruction_Branch_ConditionalTrue

      (int lineNumber)

    super(lineNumber);

  public void run(CodeSymbolTable symbolTable)

  {
    if (((Algebraic_Boolean)symbolTable.pop()) 
         .getValue())
    {
      symbolTable.branch(destination);
    }
  }

  public void list(CodeSymbolTable symbolTable)

  {
    System.out.print("iftrue "+destination);
    symbolTable.newListingLine();
  }
}

Here, destination is the protected instance variable from the parent class Instruction_Action_Branch, which looks like this:

class Instruction_Branch extends Instruction

  protected int destination;

  public Instruction_Branch(int lineNumber)

  {
    destination=lineNumber;
  }

  public void run(CodeSymbolTable symbolTable)

  {
    symbolTable.branch(destination);
  }

  public void list(CodeSymbolTable symbolTable)

  {
    System.out.print("goto "+destination);
    symbolTable.newListingLine();
  }
}

4.4.13 Implementation Of Constant Value Generation

A constant value generation instruction is simply implemented by placing the constant value encoded within the instruction on the operand stack. This is achieved using the push method within the augmented CodeSymbolTable class (§4.4.7). Consider Instruction_Action_ Evaluate_Constant_SignedInteger5 as an example:

abstract class Instruction_Action_Evaluate_Constant_

    Algebraic extends Instruction_Action_Evaluate_

    Constant
{

  protected Algebraic value;

  public Instruction_Action_Evaluate_Constant_

      Algebraic(Algebraic newValue)
  {
    value=newValue;
  }

  public void run(CodeSymbolTable symbolTable)

  {
    symbolTable.push((AlgebraicOrStructured)value);
  }
}

class Instruction_Action_Evaluate_Constant_Algebraic_

    SignedInteger5 extends Instruction_Action_

    Evaluate_Constant_Algebraic
{

  public Instruction_Action_Evaluate_Constant_

      Algebraic_SignedInteger5(Algebraic newValue)
  {
    super(newValue);
  }

  public void list(CodeSymbolTable symbolTable)

  {
    System.out.print("iconst ("+((Algebraic_
                                  SignedInteger5)
                                  value)
                                 .getValue()+")");
    symbolTable.newListingLine();
  }
}

4.4.14 Implementation Of Unary And Binary Operations

The unary and binary operations are implemented by code similar to that for Instruction_Action_Evaluate_Constant (§4.4.13), except that these two operations must also use the CodeSymbolTable's pop operation to obtain their operands.

If we consider the run method for Instruction_Action_Evaluate_BinaryOp_Sum, we will see how such operations are constructed:

  public void run

      (CodeSymbolTable symbolTable)
  {
    symbolTable.push((AlgebraicOrStructured)
                     new Algebraic_SignedInteger5(
                       ((Algebraic_SignedInteger5)
                        symbolTable.pop()).getValue()+
                       ((Algebraic_SignedInteger5)
                        symbolTable.pop()).getValue()
                     )
                    );
  }

Each operation also has a list method which outputs the instruction mnemonic.

4.4.15 Implementation Of Data Transfer Instructions

The data transfer instructions operate on a value from the variable dictionary, either replacing it with a new value taken from the operand stack (Instruction_Action_Store), or placing its value on the operand stack for use in a computation (Instruction_Action_Load). They therefore use the pop and push methods of CodeSymbolTable, used for manipulating the current operand stack.

The implementation of Instruction_Action_Load is:

class Instruction_Action_Load extends Instruction_

    Action
{

  protected QualifiedIdentifier identifier;

  ...

  public void run(CodeSymbolTable symbolTable)
  {
    symbolTable.push(symbolTable.getQualifiedValue
                       (identifier));
  }

  ...
}

There is also a destructor for listing a load instruction, list.

The implementation of Instruction_Action_Store is:

class Instruction_Action_Store extends Instruction_

    Action
{

  protected QualifiedIdentifier identifier;

...

  public void run(CodeSymbolTable symbolTable)

    if (identifier==null) {

      symbolTable.setReturnValue(symbolTable.pop());
    } else {
      symbolTable.setQualifiedValue(identifier,
                                    symbolTable.pop()
                                   );
    }
  }

  ...
}

You will notice that there is provision in the code for the parameter to this instruction being null. This is the special case of the store statement being used to assign the return variable wihin a method, as discussed in §2.15.3.

Instruction_Action_Store also has a destructor for listing, list.

4.4.16 Implementation Of Method Invocation Instructions

Whereas method invocation statements (§3.4.16) needed storage within themselves for the actual parameter expressions, method invocation instructions do not carry such information. The parameters are taken from the operand stack.

class Instruction_Action_Evaluate_Method extends

    Instruction_Action_Evaluate
{

  private QualifiedIdentifier identifier;

  private int actualParameters;

  // Constructor to create an instruction which calls 
  // a specified method

  public Instruction_Action_Evaluate_Method(

    QualifiedIdentifier newIdentifier,
    CodeMethod newMethod)
  {
    identifier=newIdentifier;
    actualParameters=newMethod.getNumberParameters();
  }

  // Destructor which pushes the result of invoking a 
  // method

  public void run(CodeSymbolTable symbolTable)

  {
    AlgebraicOrStructured result;
    ObjectInterface object;
    CodeMethod method;

    // Obtain reference to object and make current
    object=symbolTable.setCurrentObject(
             symbolTable.getParentObject(identifier)
           );

    // Obtain reference to method
    method=(CodeMethod)(object.getParentClass())
                       .getMethod(identifier
                                  .getIdentifier());

    // Set formal parameters for method, popping them 
    // off operand stack

    for (int i=0; i<actualParameters; i++)

      method.setParameter(symbolTable,i,
                          symbolTable.pop());

    // Execute method
    result=method.run(symbolTable);

    // Pop current object
    symbolTable.popCurrentObject();

    // Push result onto operand stack
    symbolTable.push(result);
  }

  // Destructor which writes a method call to the 
  // output stream

  public void list(CodeSymbolTable symbolTable)

  {
    System.out.print("invokevirtual_"+
                     actualParameters+" ("+
                     symbolTable
                     .getQualifiedSymbolName
                     (identifier)+")");
    symbolTable.newListingLine();
  }
}

4.4.17 Implementation Of Class Instance Creation Instructions

Whereas class instance creation statements (§3.4.17) needed storage within themselves for the actual parameter expressions for calling the constructor, class instance creation instructions do not carry such information. The parameters are taken from the operand stack.

class Instruction_Action_Create_Object extends

    Instruction_Action_Create
{

  private Identifier prototypeClass;

  private int constructorSignature;

  private int constructionParameters;

  // Constructor to make an instruction to create an 
  // object

  public Instruction_Action_Create_Object(

    CodeSymbolTable symbolTable,
    Identifier newPrototypeClass,

    int newConstructorSignature)

  {
    prototypeClass=newPrototypeClass;
    constructorSignature=newConstructorSignature;
    constructionParameters=((symbolTable.getClass
                               (prototypeClass))
                            .getMethodAt
                               (constructorSignature)
                           ).getNumberParameters();
  }

  // Destructor which pushes a new object onto the 
  // operand stack

  public void run(CodeSymbolTable symbolTable)

  {
    ClassInterface prototype;
    ObjectInterface object;
    CodeMethod method;

    // Obtain reference to prototype class
    prototype=symbolTable.getClass(prototypeClass);

    // Create object
    object=symbolTable.setCurrentObject(new 
             Structured_Object(prototype));

    // Obtain reference to method
    method=(CodeMethod)prototype.getMethodAt
             (constructorSignature);

    // Set formal parameters for method

    for (int i=0; i<constructionParameters; i++)

      method.setParameter(symbolTable,i,
                          symbolTable.pop());

    // Execute method
    method.run(symbolTable);

    // Push created object onto operand stack
    symbolTable.push((AlgebraicOrStructured)
                     symbolTable.popCurrentObject());
  }

  // Destructor which writes a method call to the 
  // output stream

  public void list(CodeSymbolTable symbolTable)

  {
    // Write identifier of prototype class
    System.out.print("new_"+constructionParameters+
                     " ("+symbolTable.getSymbolName
                            (prototypeClass)+")");
    symbolTable.newListingLine();
  }
}

^[7] The Java VM obviously must have a symbol table for classes.

^[8] In truth, the final implementation of the pJHI has SymbolTable as a superinterface, TreeSymbolTable and CodeSymbolTable as subinterfaces, and HashingTreeSymbolTable and HashingCodeSymbolTable as subclasses of HashingSymbolTable which implement these respective interfaces. The other classes which are changed are subclassed and subinterfaced in similar ways.

Previous Chapter Next Chapter Top Of Chapter

Exit:

Java- Trees Versus Bytes is a BComp Honours thesis written by Kade Hansson.

Last updated: Monday 12th February 2001

4 : A Comparable Code-Based Interpreter