Sample Implementation

To test the applicability of our model we derived the architecture of a Java bytecode processor from it. A simulator implements this model to verify it and to prove that adaptivity leads to a measurable performance gain. The following figure shows the architecture of the Java bytecode processor.

Model of a Java bytecode processor

Currently, the simulator adapts the communication structure to the requirements of the application (adding and removing of connections to buses, splitting and merging buses) and is able to exchange functional units.

Test Applications

To prove the effectiveness of adaptivity in the Java bytecode processor, two test applications with different characteristics had to be selected that are expected to lead to different structures. Therefore, a data dominated application (signal convolution) and a control dominated application (calculation of Ackerman's function) were chosen.

A Java program that calculates the convolution of two signals was used as the data dominated application. The control dominated application calculates Ackerman's function ack(3,2), which mainly results in recursive method calls and if-then-else structures. The resulting structures look different because of the very different usage of some functional units in the test applications. The convolution program heavily uses the object heap and the ALU, whereas the Ackerman program will use the jump unit and the method stack. Operand stack and local variable memory will be used in both applications in the same manner. The different characteristics are reflected in the appearance of different bytecodes. The convolution program consists basically of array and ALU operations. In contrast, the Ackerman program uses if-bytecodes, invokes, returns and only few ALU operations.

Results

To evaluate the speedup of the dynamic bus adaption and FU exchange in comparison to a static architecture both test programs were run in the simulator in different modes. Firstly, we measured the minimal cost (worst case performance), which results in one single bus for all components and slowest FUs. Then we measured the best case performance. Faster FUs were used if they lead to a performance improvement. It turns out, that the maximum speedup is between 23% and 29%.

Secondly, we ran the applications in adaptivity mode with a cost limit. The adaptive circuit for the convolution program performs 6% slower than the best case architecture, and the circuit for the Ackerman program is less than 1% slower than the best case circuit. The cost limit in both cases is set to 83% of the best case circuit. The next figure shows clearly, that the adaptive circuit is nearly equally fast as the best case but only requires half the cost increase compared to the cost increase of the best case.

Model of a Java bytecode processor

The following figure shows the speedup in relation to the cost limit.

Model of a Java bytecode processor

The non-monotonic characteristics of the runtime is due to coarse grained cost of the functional units. Thus, it can happen, that cost points are transfered to the FU area which results in a degradation of the communication. Probably, this behavior can be eliminated with a more elaborate heuristics.

Future Work

The aim of our future research will be:

The investigation of other architectures suitable for the AMIDAR processing model. We believe, that nearly every computing architecture with relatively powerful instruction are suitable for the AMIDAR model. These instructions will be split into tokens which are executed in parallel (see Model).
The development of real hardware that is capable to reconfigure itself according to the AMIDAR model. This requires substantial research to develop new Synthesis algorithms and tools suitable for an execution inside the system.