|title:||QuickInterp - JIT-like performance in a cross-platform cross-CPU way|
Tom van Dijk ,
The performance of Java Virtual Machine (JVM) bytecode interpreters can be severely limited by the (1) inability to perform optimizations over multiple instructions, and (2) the excessive level of branching with the interpreter loop as a result of having to do at least one jump per bytecode instruction. With the QuickInterp VM we mitigate these performance limitations within JVM interpreter design by means of superinstructions. Our interpreter is improved by supporting not just regular bytecode instructions, but also extra instructions which perform the work of a sequence of bytecode instructions (superinstructions). Bytecode is, at class load time, preprocessed where sequences of bytecode instructions are replaced by such equivalent superinstructions, requiring no alterations to or compatibility loss with existing bytecode. The interpreter source code is generated automatically based on a profile of the running application. New instruction handlers are generated by concatenating the instruction handlers of existing instructions, removing the need for manual construction of the superinstruction handlers.
Which sequences of instructions to convert into a superinstruction, and how to perform the most effective substitution of superinstructions into a loaded program, are questions which we answer in this thesis. Earlier work has shown that finding the optimal superinstruction set is NP-hard. As such, we implement an iterative optimization algorithm to find the optimal superinstruction set. Furthermore, we design and test various runtime substitution algorithms. Our new shortest-path based runtime substitution algorithm uses shortest-path based pathfinding through the input program to find the combination of superinstructions that lowers the amount of required instruction dispatches as much as possible. Furthermore, we enhance the shortest-path algorithm by doing substitution based on equivalence, extending the impact each individual superinstruction can make at runtime. We compare our new runtime substitution algorithms against a reimplementation of a substitution algorithm from earlier work.
Our methods show that the superinstruction optimization is still valid in 2020, boasting a 45.6% performance improvement against baseline in a small arithmetic benchmark, and showing a 33.0% improvement against baseline in a larger Spring Boot-based web application benchmark. Our iterative superinstruction set construction algorithm manages to find near-optimal solutions for the NP-hard problem of constructing the superinstruction set. However, we also show how more advanced superinstruction placement algorithms do not offer the same return on investment. Given enough superinstructions, each of the tested substitution algorithms is capable of achieving similar performance improvements.
The code and benchmarks are available at https://github.com/LukasMiedema/QuickInterp.