1
: Introduction

1.1 Evolution Of Language Systems

Language translation is a luxury that was at one time not available to programmers. The pioneers of computing technology were forced to program their bulky machines by adjusting banks of switches. To force a human to adjust her rich expressive command of natural language to a crude long-winded two-way semaphore was to severely reduce the complexity of programs which could be entered, and thus restricted terribly the range of applications computers could be used for.

It was clear something needed to be done. The first step was to give mnemonic names to the binary encoded operation codes (opcodes). In this way, assembly languages were born. But they were still a far cry from natural language, as long sequences of mnemonics were needed to perform simple tasks. Macro assembly languages helped by providing better access to common opcode sequences, but there was still a long way to go.

One approach to programming a computer at a higher-level was to write a program which accepted approximate natural language commands, and executed them in an interactive way. The interpreter was born.

But these interpreters still had to be written in macro assembly languages, and it was apparent that instead of interactive interpreters, "batch interpreters", or compilers, were needed. These would take a string of approximate natural language commands and generate a program in the computer's native language.

While compilers offered a speed advantage, initially the interactivity of interpreters was favoured. With processor architectures failing to settle down, compilers were under constant strain to generate code for new and different machines. However, an interpreter could be quickly written for such machines, with existing programs immediately executable. Interpreters also provided for fast prototyping of program designs. At the time, this was what was desired. Compilers were still developed, but it would take time for the pendulum of favour to swing their way.

Eventually the speed and efficiency of compilers started to win out. The processor architectures around began to stabilise, and the tools of compiler writing were now well developed. Interpreters became a thing of the past.

1.2 The Java Scheme: Bytes

It hasn't been until this decade arguments of software portability and mobility have come to be recognised, and the interpreter is now enjoying a renaissance of sorts. An interpreter is used in the back end of the currently popular Java language to provide platform independence. However, this interpreter, the Java Virtual Machine, instead of executing approximate natural language commands like the interpreters of old, executes a virtual machine code compiled from an object-oriented language, the Java source language.

In this way, we see that Java is a hybrid scheme, as it has a compiler and an interpreter. The Java source language is compiled to the bytecode instructions of a virtual machine. This program can then be executed on any machine which has a Java interpreter. It is in this way that Java achieves portability. The bytecode encoding is used to achieve compactness and security.

As is noted in Lindholm (1997), the Java VM is not inherently interpreted. It is suggested that the virtual machine code could be translated to that of a real machine, or that the processes of an interpreter might be implemented directly in silicon. The aim of these two optimizations is obviously to promote efficiency, something which is not inherent to the design of the Java VM.

While Java in silicon has still not been realised, it is the case that many interpreters do perform just-in-time (JIT) compilation to translate Java bytecodes to native machine code. This approach has been very successful, but is not unique to Java. And there is some argument as to whether the Java bytecode encoding lends itself to efficient JIT compilation when compared with other schemes.

1.3 A New Approach: Trees

Why use a bytecode at all? Now that is a fundamental question. Lindholm (1997) claims that it is compact and secure. The Java VM instruction set has optimizations in it, and JIT compilers have been able to analyse the virtual code sufficiently well to generate reasonably efficient native code. But Java still falls short of being efficient.

Java's main competitor when it comes to efficiency are native compilers. With access to the source program trees, and decades of compiler optimization knowledge, these compilers are capable of producing blisteringly fast code. In fact, automatically generated code now rivals the efficiency of hand-coded fragments.

It is clear that by reducing the source program tree to a supposedbly compact byte code, Java is throwing away information that might well be useful to JIT compilers. Indeed, it is this argument that Franz (1998) uses to discredit the Java approach and promote approaches like his own Juice system which compile syntax trees.

The distribution format of Juice is based on abstract syntax trees instead of on virtual machine bytecodes like Java. It too is a portable code delivery system, designed for use on large distributed networks and taking advantage of graphical user interface technology. However, it could be said that Juice in its present form is less portable than Java in that it requires proprietary browser technology to function- namely the plug-in.

Juice's other main disadvantage is that it is saddled with the Oberon language environment, which while having much in common with Java, and additionally having a small code footprint, is lacking in many of the features expected by today's programmer. But it is not Juice, but what Juice represents, that is more important.

Java claims that its bytecode representation is compact. However, the slim binary distribution format used in Juice is much, much more compact than Java. Java claims that its bytecode representation is secure, but in order to ensure this, the code must be carefully verified using a data-flow analysis. Franz (1998) claims that the abstract syntax of Juice will be able to be verified correct much faster, as it lends itself more readily to such analysis.

But above all this, Juice claims efficiency. Because the source program tree is available to Juice, and there is no overhead involved in extracting or deducing such information from a "flat" bytecode representation, the JIT compilation in Juice can be optimized more heavily, and that optimization can be carried out faster. And not only that, the compactness of slim binaries also leads to a faster download time and therefore an even quicker response.

1.4 Trees Or Bytes?

The question that both Juice and Java leave unanswered is: Would an abstract tree representation, which more closely represents program source, be able to be interpreted faster than a bytecode representation?

It will be the aim of this thesis to test the hypothesis that tree-code is able to be more efficiently interpreted than bytecodes like those used within the Java VM. If this is indeed the case, and the claims of Franz (1998) about Juice are correct, there is absolutely no reason to continue using bytecodes within portable language systems like Java.

If trees can be interpreted faster, and compiled faster and into more efficient code, then abstract program trees will become the commodity of programmers in the future.

Next Chapter Top of Chapter

1 : Introduction

1.1 Evolution Of Language Systems

1.2 The Java Scheme: Bytes

1.3 A New Approach: Trees

1.4 Trees Or Bytes?

1
: Introduction