In this chapter, we will look at the empirical results obtained by running multiple trials of the same benchmark under the three interpreter implementations. In this way, we will be able to determine which implementation of each benchmark is more efficient, and by examining all these results together, we will hopefully be able to determine which implementation was most successful.
In determining the most efficient implementation, and comparing the differences in efficiency between implementations, we should be able to answer the question posed in §1.4: Are tree representations able to be interpreted faster than code representations? We will also be able to make comments about in what circumstances the difference is widened or narrowed.
All trials were conducted using the same Power Macintosh 6100/60 running MacOS 8 and Metrowerks Java. The trials were conducted while the system was under the same load conditions, and the elapsed times are for the system clock. Shaded trials have not been included in averages.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 7.671 | 6.172 | 6.157 | 6.169 | 7.373 | 6.184 | 6.218 | 7.05 | 6.136 | 6.155 | /tr>
Tree | 6.232 | 6.077 | 6.076 | 6.143 | 6.166 | 6.106 | 6.054 | 6.118 | 6.035 | 6.141 | /tr>
Opt. Tree | 1.485 | 1.588 | 1.45 | 1.651 | 1.597 | 1.432 | 1.56 | 1.442 | 1.54 | 1.419 |
Trial set 2 | /tr> | |||||||||
Code | 7.892 | 6.245 | 6.272 | 6.677 | 6.264 | 6.226 | 7.503 | 6.239 | 6.303 | 6.258 | /tr>
Tree | 6.644 | 6.428 | 6.447 | 6.397 | 6.575 | 6.411 | 6.421 | 6.516 | 6.377 | 6.435 | /tr>
Opt. Tree | 2.736 | 1.723 | 1.615 | 1.757 | 1.783 | 1.592 | 1.756 | 1.705 | 1.591 | 1.74 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 6.402 | 6.443 | 6.422 | /tr>
Tree | 6.102 | 6.445 | 6.274 | /tr>
Opt. Tree | 1.520 | 1.696 | 1.608 |
↑ Table 6.1 EmptyLoop Results
For EmptyLoop, the stacking tree interpreter is 2.32% faster than the code interpreter, while the optimized tree interpreter is 74.96% faster.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 16.862 | 16.871 | 16.979 | 16.828 | 16.849 | 16.911 | 16.781 | 16.893 | 16.832 | 16.782 | /tr>
Tree | 16.785 | 16.596 | 16.652 | 16.797 | 16.617 | 16.605 | 16.656 | 16.682 | 16.653 | 16.653 | /tr>
Opt. Tree | 4.867 | 4.799 | 4.699 | 4.794 | 4.741 | 4.799 | 4.772 | 4.733 | 4.81 | 4.728 |
Trial set 2 | /tr> | |||||||||
Code | 16.882 | 16.733 | 16.883 | 16.815 | 16.796 | 16.839 | 16.817 | 16.909 | 16.772 | 16.791 | /tr>
Tree | 16.763 | 16.569 | 16.651 | 16.785 | 16.579 | 16.65 | 16.688 | 16.619 | 16.645 | 16.659 | /tr>
Opt. Tree | 5.002 | 4.799 | 4.81 | 4.914 | 4.828 | 4.94 | 4.805 | 4.831 | 4.881 | 4.888 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 16.858 | 16.817 | 16.838 | /tr>
Tree | 16.657 | 16.649 | 16.653 | /tr>
Opt. Tree | 4.774 | 4.870 | 4.822 |
↑ Table 6.2 DeepEmptyLoop Results
For DeepEmptyLoop, the stacking tree interpreter is 1.10% faster than the code interpreter, while the optimized tree interpreter is 71.36% faster.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 8.306 | 8.275 | 8.283 | 8.225 | 8.313 | 8.251 | 8.181 | 8.278 | 8.276 | 8.308 | /tr>
Tree | 8.182 | 8.167 | 8.077 | 8.033 | 8.118 | 8.136 | 8.134 | 8.12 | 8.143 | 8.109 | /tr>
Opt. Tree | 2.994 | 2.919 | 2.956 | 3.045 | 2.947 | 3.02 | 2.996 | 2.955 | 2.924 | 2.98 |
Trial set 2 | /tr> | |||||||||
Code | 8.557 | 8.49 | 8.79 | 8.475 | 8.483 | 8.53 | 8.428 | 8.506 | 8.522 | 8.671 | /tr>
Tree | 8.457 | 8.503 | 8.555 | 8.485 | 8.546 | 8.399 | 8.411 | 8.496 | 8.529 | 8.453 | /tr>
Opt. Tree | 3.134 | 3.188 | 3.085 | 3.111 | 3.142 | 3.089 | 3.19 | 3.081 | 3.187 | 3.137 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 8.270 | 8.545 | 8.407 | /tr>
Tree | 8.122 | 8.483 | 8.303 | /tr>
Opt. Tree | 2.974 | 3.134 | 3.054 |
↑ Table 6.3 FieldAccess Results
For FieldAccess, the stacking tree interpreter is 1.25% faster than the code interpreter, while the optimized tree interpreter is 63.67% faster.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 11.088 | 12.335 | 10.992 | 12.011 | 11.038 | 11.179 | 11.058 | 11.207 | 11.086 | 11.152 | /tr>
Tree | 11.485 | 11.342 | 11.501 | 11.316 | 11.434 | 11.304 | 11.433 | 11.447 | 11.402 | 11.37 | /tr>
Opt. Tree | 4.867 | 4.818 | 4.807 | 4.781 | 4.84 | 4.898 | 4.77 | 4.782 | 4.747 | 4.847 |
Trial set 2 | /tr> | |||||||||
Code | 11.336 | 12.084 | 12.163 | 11.297 | 11.299 | 12.081 | 11.982 | 11.402 | 11.444 | 11.336 | /tr>
Tree | 11.531 | 11.312 | 11.498 | 11.379 | 11.434 | 11.322 | 11.515 | 11.449 | 11.46 | 11.299 | /tr>
Opt. Tree | 4.896 | 4.728 | 4.761 | 4.74 | 4.83 | 4.819 | 4.741 | 4.745 | 4.712 | 4.858 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 11.315 | 11.642 | 11.479 | /tr>
Tree | 11.403 | 11.420 | 11.412 | /tr>
Opt. Tree | 4.816 | 4.783 | 4.799 |
↑ Table 6.4 MethodCall Results
For MethodCall, the stacking tree interpreter is 0.58% faster than the code interpreter, while the optimized tree interpreter is 58.19% faster.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 14.797 | 14.688 | 14.853 | 14.731 | 14.798 | 14.677 | 14.824 | 14.758 | 14.727 | 14.797 | /tr>
Tree | 14.696 | 14.559 | 14.656 | 14.632 | 14.527 | 14.636 | 14.636 | 14.542 | 14.577 | 14.593 | /tr>
Opt. Tree | 3.581 | 3.56 | 3.481 | 3.62 | 3.507 | 3.56 | 3.568 | 3.505 | 3.595 | 3.479 |
Trial set 2 | /tr> | |||||||||
Code | 14.796 | 14.605 | 14.779 | 14.669 | 14.791 | 14.685 | 14.798 | 14.726 | 14.811 | 14.815 | /tr>
Tree | 14.669 | 14.527 | 14.689 | 14.586 | 14.526 | 14.617 | 14.669 | 14.483 | 14.595 | 14.64 | /tr>
Opt. Tree | 3.536 | 3.566 | 3.462 | 3.621 | 3.517 | 3.474 | 3.5 | 3.491 | 3.577 | 3.461 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 14.765 | 14.748 | 14.746 | /tr>
Tree | 14.605 | 14.600 | 14.603 | /tr>
Opt. Tree | 3.546 | 3.521 | 3.533 |
↑ Table 6.5 OftenTrue Results
For OftenTrue, the stacking tree interpreter is 1.04% faster than the code interpreter, while the optimized tree interpreter is 76.06% faster.
Twenty trials were carried out for each of the three implementations. The results were as follows (all times are in seconds):
Trial set 1 | /tr> | |||||||||
Code | 1.601 | 1.376 | 1.47 | 1.356 | 1.59 | 1.475 | 1.364 | 1.547 | 1.39 | 1.462 | /tr>
Tree | 1.559 | 1.485 | 1.324 | 1.428 | 1.4 | 1.46 | 1.32 | 1.425 | 1.331 | 1.509 | /tr>
Opt. Tree | 0.83 | 0.917 | 0.737 | 0.74 | 0.836 | 0.743 | 0.735 | 0.878 | 0.805 | 0.731 |
Trial set 2 | /tr> | |||||||||
Code | 1.396 | 1.558 | 1.522 | 1.38 | 1.586 | 1.385 | 1.55 | 1.406 | 1.51 | 1.529 | /tr>
Tree | 1.593 | 1.38 | 1.497 | 1.517 | 1.449 | 1.484 | 1.374 | 1.523 | 1.384 | 1.488 | /tr>
Opt. Tree | 0.766 | 0.985 | 0.746 | 0.751 | 0.756 | 0.958 | 0.75 | 0.753 | 0.94 | 0.746 |
Average of trial set 1 | Average of trial set 2 | Average overall | /tr>|
Code | 1.463 | 1.482 | 1.473 | /tr>
Tree | 1.424 | 1.469 | 1.447 | /tr>
Opt. Tree | 0.795 | 0.815 | 0.805 |
↑ Table 6.6 Fibonacci Results
For Fibonacci, the stacking tree interpreter is 1.78% faster than the code interpreter, while the optimized tree interpreter is 45.33% faster.
As expected, EmptyLoop showed the greatest speed up, both in the best-case and worst-case tree interpreters. This indicates that statement-based interpreters do derive an advantage from not having to explicitly execute branch instructions.
OftenTrue was the best performer in terms of potential speed-up, and this could be attributed to its heavy use of the operand stack. This would advantage the optimized tree interpreter significantly, but would have no effect at all on the stacking tree interpreter (as observed in the results).
DeepEmptyLoop showed a high potential speed up also, but in the stacking tree interpreter this was lost. The results for this are diluted by the sheer amount of data transfer going on relative to the amount of loop mechanics.
Clearly loops benefit most from a tree interpreter, as expected. Even the "real world" example of Fibonacci showed a 45% speed up in the best case, and a little shy of 2% speed up in the worst case.