Bottom of Chapter Contents Previous Chapter

7
: Conclusion And Future Work

We can say from the results that it appears that interpreting tree-based programs is faster than executing equivalent code-based programs. Even in the case where our interpreter is disadvantaged by a strictly unnecessary operand stack in additon to the nested method invocation stack (which suffices to implement correct behaviour), the statement tree outperforms the linear instruction approach.

In the best case, we see a vast potential speed-up arising from a statement interpreter. However, further work on truly optimized interpreters which are more directly comparable to implementations of the Java VM will be needed to determine exactly what is achievable.

7.1 Trees Over Code

The question posed in §1.4 has been answered. Trees are a more efficient mechanism for implementing interpreters. Instruction based approaches are disadvantaged by their use of explicit branch instructions at the language level, while statement based approaches move flow control into the interpreter itself. The Java VM definition implies an explicit operand stack. By using statement trees, we eliminate the need for such an explicit stack.

For applications where speed of execution is important, an interpreter which uses trees may be more desirable than one which uses code. But one has to weigh this advantage against possible disadvantages of a tree based approach.

Intuitively, and particularly by observing the DeepEmptyLoop example, it would seem that trees are a more compact mechanism than code. At least, they seem to be no larger. So this seems to be no legitimate argument against trees.

In terms of security, intuitively it would seem that trees offer more information about the operation of a program than virtual machine code. The virtual machine code has to be decrypted in order to understand the malicious things it might do. A statement based program would seem more readily analysable in this respect, so there appears to be no argument that trees might be less easily verifiable than code.

However, it is true to say that a tree interpreter is more complicated to implement than an instruction based interpreter. It is probably also true that it is more difficult to implement a tree interpreter well. But is it valid to avoid trees simply because they are more difficult to implement and implement well, when there is an advantage in terms of efficiency, and possibly also in terms of compactness and verifiability?

7.2 Building On The Results

Now that we know trees are faster than bytes, the next question to be answered is: how much faster can we interpret trees than bytes? This question has not been satisfactorily answered by this thesis. We can see the potential of the tree based approach, but we cannot say that this will be realised in actuality.

The limited scope of this project meant that it was not feasible to write an interpreter which was directly comparable with a real Java VM. This would need to be done to establish accurate results. Ideally, one would like to either mutate the pJTI or create a new system which, instead of requiring that a comparable instruction interpreter, would have all the same optimizations as the real Java VM:

Use of indexes to reference local variables within invocation frames instead of using a symbol table.
Create special meta-statements for commonly occuring statement sequence, in the same way as the Java VM uses meta-instructions like if_icmplt.

By writing a system using the same tools as those at the disposal of those writing Java VMs, it should be possible to write a statement interpreter that beats purely interpreted Java VMs.

The step beyond getting an accurate comparison is obviously to get a working system which could replace Java. Such a project would constitute a massive software engineering task. But the foundations have already been laid.

7.3 Related Questions

The following sections introduce some issues which may be examined by furure work on the topic of abstract syntax trees.

7.3.1 A Compact Representation For Trees

The Juice project discussed in Franz (1998) has shown that trees lend themselves to a highly compact representation which merges equivalent subtrees within a program and then uses adaptive compression (not unlike that used by the classic LZW algorithm) to further reduce the size. Franz claims that the representation is the most dense executable form that he knows of.

The creation of compact tree representations is an intriguing pursuit. Examining the desirable properties of such trees, achieving better compression and developing faster codecs will no doubt be a growth area of research in coming years. One such project in this area may lead to an eventual replacement for the Java class file format.

7.3.2 Faster Generation Of Portable Code

Generating program trees on the fly from a source editor is the much awaited next step in programming. The compilation bottleneck is still a worrisome limitation on software engineering productivity. Restoring the interactivity possible with the interpreters of days past to current development environments can only benefit the programming fraternity at a time where demand for quickly-made yet robust prototype solutions is rapidly increasing.

If a generalized form of abstract tree which encompasses all languages is developed (which may well be the holy grail of software engineering) it will herald a new era in productivity and code resuability. The first step in this will be the development of portable platform-independent executable formats used over worldwide networks.

7.3.3 Manipulating Abstract Trees

Abstract trees are more readily converted to human readable form than other executable encodings like the Java class file format. It will be interesting to see if the software engineering world embraces such open formats, and if it does, it may lead to a boom in tools which translate abstract trees in various ways so that code can be adapted to meet individual requirements.

7.3.4 Better JIT Compilers

Perhaps the greatest opportunity arising through the use of abstract syntax trees is the possibility of highly optimized JIT compilation. With the full benefit of information from the structure of the source program, it will be possible to examine programs longer and harder to squeeze the most out of any prospective translation.

Systems like the two-tier strategy of code generation mentioned in Franz (1998) mean that JIT may no longer be a once only thing. It is possible that a JIT compiler may generate a crude form of the code required to implement a syntax tree immediately, giving the user control back faster, but then proceed to refine and replace that code in the background as further analysis can be carried out.

JIT compilation is a technology which is sure to provide many exciting research topics in future.

Previous Chapter Top of Chapter

7 : Conclusion And Future Work