The second step to comparing the efficiency of tree-based and instruction-based interpreters, having built a tree-based interpreter, is to create the instruction-based interpreter. This chapter explores the design choices taken in the augmenting the pJTI to create the Picojava Hybrid Interpreter (pJHI), so that it can interpret program trees which are composed of instructions as well as those composed of statements.
The design and implementation choices made here are all made with a view to providing an instruction-based interpreter which is as similar as possible to the existing statement-based interpreter. By doing this, it can be subsequently argued that any comparisons made between the two interpreters reflect only the differences arising from the structural differences of the two approaches, and not the underlying implementation detail.
An instruction-based interpreter is still tree-based in thes sense that it requires a representation of the program tree. The abstract grammar of the Picojava assembly language, discussed alongside the Picojava source language in Chapter 2 and included in its entirety in Appendix B, gives the structure of this tree. The difference comes only at the level of Block:
As well as the program tree we also need some mechanism for keeping track of the dynamic state of a program. Instead of seeking to emulate the implementation choices with a real instruction-based Java VM, we choose to use a symbol table which mimics as closely as possible that used by the pJTI.
The differences will come as there will now be different requirements for interpreter state, the third and final ingredient of the interpreter. The design will now need to incorporate a program counter and an explicit operand stack, as instructions will need to interact with both these entities. In the statement interpreter, the program counter was successfully hidden in the implementation of Block, and the operand stack was implicit from the invocation stack of statements in the statement tree.
While the Java VM does not have a readily identifiable symbol table, at least not for variables[7], in order to fairly compare the statement interpreter and instruction interpreter, this element of implementation is retained as it is described in §3.1.1.
The Java VM keeps a set of method invocation frames in order to keep track of the stack of methods which are currently executing, with the most recently invoked method being on top of the stack. Indeed, any instruction interpreter will need to keep track of method invocations, and to make it comparable to the pJTI, block invocations will also be required. There will be instruction invocations, but these are much less imporant than statement invocations, as they will not be directly nested in the same way as statement invocations must be.
In the Java VM, there is no such thing as a statement, and each method invocation frame has associated with it a program counter which indicates the current instruction executing in that method and an operand stack which keeps track of computations within each method. These will need to be included in an explicit way in an instruction interpreter, as they are inherent to the behaviour of such an interpreter, but they will be associated with Blocks and not Methods. This is in line with the way in which the pJTI has been designed.
In the Java VM, at the level of a method, the tree becomes a linearized structure where instructions are bytecoded. The interpreter suggested here does not necessarily use bytecodes, but it does use a linearized structure at the level of Block to represent Instructions.
Like a statement-based program tree, an instruction-based program tree is a structure which at its root has a single CompilationUnit node. Each node is in turn composed of further nodes, as given by the rules Picojava assembly language grammar of Appendix B.
The symbol table will be a class, and will act primarily as a dictionary matching qualified identifiers with variables (of primitive types or class instances), classes and methods. This is as for the pJTI, and the design is more fully discussed in §3.2.1.
Whereas the stack of statement invocations could be used within the pJTI to record the current progress of a computation, the same is not possible in an instruction interpreter. An explicit operand stack is needed in order to record the results from previous instructions so that they may be used by subsequent instructions; i.e. instructions are not composed further instructions in the same way as statements can be composed of further statements.
Therefore, in line with the way we augmented the symbol table designed in §3.2 with interpreter state, we will further augment the symbol table with operations for dealing with an explicit operand stacks:
We also augment the symbol table with operations for maintaining a program counter. This will need to be altered by branch instructions in order to effect flow control, so it is not sufficient to include it in the implementation of the Blocks themselves (as was the case in the pJTI).
The following classes will be used to construct an instruction-based program tree:
Where functions of existing classes from the pJTI need to be augmented, the new classes will be defined as subclasses of the existing classes, prefixed with the word Code. It is assumed that such subclassing would not be naïvely implemented by any Java VM used to run any comparison, and as a result the use of subclasses carries no overhead.[8]
The new methods required in the implementation of CodeSymbolTable to provide for an operand stack are:
To maintain a program counter, it is necessary to add methods for:
The implementation of interpreter state is shared between the CodeSymbolTable class and the Java language itself. Some interpreter state is implicit in the nested method invocations and associated frames of variables within the Java environment, while the remainder is explicitly given, either in the variables used by executing methods, or within explicit repositories provided by the CodeSymbolTable.
All but the terminal components of an instruction-based program tree have a list method. This list method takes one parameter, the CodeSymbolTable which is associated with that program tree. The CodeSymbolTable contains all the information required to produce a concrete listing for a program tree: all that is necessary is to call the list method for the root CompilationUnit object of that tree with the appropriate CodeSymbolTable object as parameter.
Some components of the program tree are executable, and therefore also have a run method. Again, the run method only takes one parameter, the CodeSymbolTable. The CodeSymbolTable is passed from method to method as the program tree is traversed, and is dynamically updated with program state information, particularly with values of variables, the values on operand stacks and the value of the current program counter.
A run method in the instruction interpreter may have a return type, but it is not appropriate for any member of the Instruction class to have a return type for its run method.
The executable components of an instruction-based program tree (i.e. those with a run method) are:
Identifiers and QualifiedIdentifiers are implemented in the pJHI as they are implemented in the pJTI.
Identifiers for all but class and method references are eliminated in the Java VM, but in order to ensure that the two interpretive approaches can be structurally compared, we need to ensure that they are dealing with the same basic entities. For the instruction interpreter to deal with pointers while the chosen implementation for the statement interpreter is forced to look up identifiers in tables would be unacceptable. See §4.4.3 below for a further explanation of this.
The AlgebraicOrStructured class and its subclasses are implemented in the pJHI as they are implemented in the pJTI. This is because the data types in both the statement and instruction purposes need to be the same in order to write equivalent programs.
There class dictionary and variable dictionary are implemented in the pJHI as they are implemented in the pJTI.
While the Java VM would use a class dictionary also, the function of a variable dictionary is eliminated through the use of pointers within instructions to reference a specific location in the current method invocation frame. For instance, Java's equivalent of the Instruction_Action_Load instruction looks like:
iload #3
This instruction means "load the integer from index three in the current method invocation frame". There is no reason why a more sophisticated version of the pJTI couldn't use a similar scheme, but for the purposes of the current comparison, it is sufficient to ensure that both interpreters use the same dictionary implementations. For example, one can imagine a statement like "reference the value associated with location three in the current block invocation frame".
The pJHI uses symbol stacks in the same way as the pJTI, and the implementation is also the same.
Strictly speaking, because the instruction intepreter evaluates actual parameters before the time of method call instead of during the time of method call, IcyStacks (§3.4.4) are not needed. However, it is convenient to use the same implementation of symbol stacks in the instruction interpreter, and it removes any possible advantage which might be gained by a different implementation.
The return variable is used in the same way and implemented in the same way in both the pJTI and pJHI.
Once again, there is no change between the use and implementation of concrete listing generation between the pJTI and pJHI. Of course, the list methods list statements and instructions in different ways, but the mechanisms for that listing are the same.
As well as the methods provided for convenience (§3.4.7), which are implemented in the same way in both interpreters, the pJHI has additional methods for managing the explicit operand stack and program counter:
The stack of operand stacks, and the operand stacks themselves are both implemented using instances of java.util.Stack.
At the level of CompilationUnit, there is no functional change from the implementation in the pJTI to that in the pJHI.
Again, there is no functional change in Classes from the implementation in the pJTI to that in the pJHI.
CodeMethods are constructed using CodeMethod(Identifier, CodeBlock) while constructors are constructed using CodeMethod(CodeBlock). Parameters are then added using addParameter(Parameter), with a Parameter having no identifier being identified as the return parameter.
As for Methods, the Parameters of a method are stored in an instance of java.util.Vector.
CodeBlocks are constructed using Block(). Declarations are then added using addDeclaration(Declaration). Instructions are added using addInstruction (Instruction).
As was the case for Blocks, the Declarations and Instructions of a block are stored in separate instances of java.util.Vector.
The construction of a Instruction_Action_Branch, Instruction_Action_Branch_ ConditionalTrue or Instruction_Action_Branch_ConditionalFalse requires only a single parameter indexing the instruction in the current block which is to be branched to.
The implementation of the run method requires only that the condition (if any) be tested, and the branch be taken if the condition has the required value. A straight Instruction_Action_Branch has no condition to test- it branches uncoditionally.
As an example of implementation, consider Instruction_Action_Branch_ ConditionalTrue:
class Instruction_Branch_ConditionalTrue extends
Instruction_Branch {
public Instruction_Branch_ConditionalTrue
(int lineNumber)
{
super(lineNumber);
}
public void run(CodeSymbolTable symbolTable)
{ if (((Algebraic_Boolean)symbolTable.pop()) .getValue()) { symbolTable.branch(destination); } }
public void list(CodeSymbolTable symbolTable)
{ System.out.print("iftrue "+destination); symbolTable.newListingLine(); } }
Here, destination is the protected instance variable from the parent class Instruction_Action_Branch, which looks like this:
class Instruction_Branch extends Instruction
{
protected int destination;
public Instruction_Branch(int lineNumber)
{ destination=lineNumber; }
public void run(CodeSymbolTable symbolTable)
{ symbolTable.branch(destination); }
public void list(CodeSymbolTable symbolTable)
{ System.out.print("goto "+destination); symbolTable.newListingLine(); } }
A constant value generation instruction is simply implemented by placing the constant value encoded within the instruction on the operand stack. This is achieved using the push method within the augmented CodeSymbolTable class (§4.4.7). Consider Instruction_Action_ Evaluate_Constant_SignedInteger5 as an example:
abstract class Instruction_Action_Evaluate_Constant_
Algebraic extends Instruction_Action_Evaluate_
Constant {
protected Algebraic value;
public Instruction_Action_Evaluate_Constant_
Algebraic(Algebraic newValue) { value=newValue; }
public void run(CodeSymbolTable symbolTable)
{ symbolTable.push((AlgebraicOrStructured)value); } }
class Instruction_Action_Evaluate_Constant_Algebraic_
SignedInteger5 extends Instruction_Action_
Evaluate_Constant_Algebraic {
public Instruction_Action_Evaluate_Constant_
Algebraic_SignedInteger5(Algebraic newValue) { super(newValue); }
public void list(CodeSymbolTable symbolTable)
{ System.out.print("iconst ("+((Algebraic_ SignedInteger5) value) .getValue()+")"); symbolTable.newListingLine(); } }
The unary and binary operations are implemented by code similar to that for Instruction_Action_Evaluate_Constant (§4.4.13), except that these two operations must also use the CodeSymbolTable's pop operation to obtain their operands.
If we consider the run method for Instruction_Action_Evaluate_BinaryOp_Sum, we will see how such operations are constructed:
public void run
(CodeSymbolTable symbolTable) { symbolTable.push((AlgebraicOrStructured) new Algebraic_SignedInteger5( ((Algebraic_SignedInteger5) symbolTable.pop()).getValue()+ ((Algebraic_SignedInteger5) symbolTable.pop()).getValue() ) ); }
Each operation also has a list method which outputs the instruction mnemonic.
The data transfer instructions operate on a value from the variable dictionary, either replacing it with a new value taken from the operand stack (Instruction_Action_Store), or placing its value on the operand stack for use in a computation (Instruction_Action_Load). They therefore use the pop and push methods of CodeSymbolTable, used for manipulating the current operand stack.
The implementation of Instruction_Action_Load is:
class Instruction_Action_Load extends Instruction_
Action {
protected QualifiedIdentifier identifier;
... public void run(CodeSymbolTable symbolTable) { symbolTable.push(symbolTable.getQualifiedValue (identifier)); } ... }
There is also a destructor for listing a load instruction, list.
The implementation of Instruction_Action_Store is:
class Instruction_Action_Store extends Instruction_
Action {
protected QualifiedIdentifier identifier;
...
public void run(CodeSymbolTable symbolTable)
{
if (identifier==null) {
symbolTable.setReturnValue(symbolTable.pop()); } else { symbolTable.setQualifiedValue(identifier, symbolTable.pop() ); } } ... }
You will notice that there is provision in the code for the parameter to this instruction being null. This is the special case of the store statement being used to assign the return variable wihin a method, as discussed in §2.15.3.
Instruction_Action_Store also has a destructor for listing, list.
Whereas method invocation statements (§3.4.16) needed storage within themselves for the actual parameter expressions, method invocation instructions do not carry such information. The parameters are taken from the operand stack.
class Instruction_Action_Evaluate_Method extends
Instruction_Action_Evaluate {
private QualifiedIdentifier identifier;
private int actualParameters;
// Constructor to create an instruction which calls // a specified method
public Instruction_Action_Evaluate_Method(
QualifiedIdentifier newIdentifier, CodeMethod newMethod) { identifier=newIdentifier; actualParameters=newMethod.getNumberParameters(); } // Destructor which pushes the result of invoking a // method
public void run(CodeSymbolTable symbolTable)
{ AlgebraicOrStructured result; ObjectInterface object; CodeMethod method; // Obtain reference to object and make current object=symbolTable.setCurrentObject( symbolTable.getParentObject(identifier) ); // Obtain reference to method method=(CodeMethod)(object.getParentClass()) .getMethod(identifier .getIdentifier()); // Set formal parameters for method, popping them // off operand stack
for (int i=0; i<actualParameters; i++)
method.setParameter(symbolTable,i, symbolTable.pop()); // Execute method result=method.run(symbolTable); // Pop current object symbolTable.popCurrentObject(); // Push result onto operand stack symbolTable.push(result); } // Destructor which writes a method call to the // output stream
public void list(CodeSymbolTable symbolTable)
{ System.out.print("invokevirtual_"+ actualParameters+" ("+ symbolTable .getQualifiedSymbolName (identifier)+")"); symbolTable.newListingLine(); } }
Whereas class instance creation statements (§3.4.17) needed storage within themselves for the actual parameter expressions for calling the constructor, class instance creation instructions do not carry such information. The parameters are taken from the operand stack.
class Instruction_Action_Create_Object extends
Instruction_Action_Create {
private Identifier prototypeClass;
private int constructorSignature;
private int constructionParameters;
// Constructor to make an instruction to create an // object
public Instruction_Action_Create_Object(
CodeSymbolTable symbolTable, Identifier newPrototypeClass,
int newConstructorSignature)
{ prototypeClass=newPrototypeClass; constructorSignature=newConstructorSignature; constructionParameters=((symbolTable.getClass (prototypeClass)) .getMethodAt (constructorSignature) ).getNumberParameters(); } // Destructor which pushes a new object onto the // operand stack
public void run(CodeSymbolTable symbolTable)
{ ClassInterface prototype; ObjectInterface object; CodeMethod method; // Obtain reference to prototype class prototype=symbolTable.getClass(prototypeClass); // Create object object=symbolTable.setCurrentObject(new Structured_Object(prototype)); // Obtain reference to method method=(CodeMethod)prototype.getMethodAt (constructorSignature); // Set formal parameters for method
for (int i=0; i<constructionParameters; i++)
method.setParameter(symbolTable,i, symbolTable.pop()); // Execute method method.run(symbolTable); // Push created object onto operand stack symbolTable.push((AlgebraicOrStructured) symbolTable.popCurrentObject()); } // Destructor which writes a method call to the // output stream
public void list(CodeSymbolTable symbolTable)
{ // Write identifier of prototype class System.out.print("new_"+constructionParameters+ " ("+symbolTable.getSymbolName (prototypeClass)+")"); symbolTable.newListingLine(); } }
[7] The Java VM obviously must have a symbol table for classes.
[8] In truth, the final implementation of the pJHI has SymbolTable as a superinterface, TreeSymbolTable and CodeSymbolTable as subinterfaces, and HashingTreeSymbolTable and HashingCodeSymbolTable as subclasses of HashingSymbolTable which implement these respective interfaces. The other classes which are changed are subclassed and subinterfaced in similar ways.