Java 7: How to write really fast Java code
When I first wrote this blog my intention was to introduce you to a class which is new in Java 7 to generate random numbers. I have analyzed the performance of in a series of micro-benchmarks to find out how it performs in a single threaded environment.
The results were relatively surprising: although the code is very similar,
is twice as fast as ! The results drew my interest and I decided to investigate this a little further. I have documented my anlysis process. It is an examplary introduction into analysis steps, technologies and some of the JVM diagnostic tools required to understand differences in the performance of small code segments. Some experience with the described toolset and technologies will enable you to write faster Java code for your specific Hotspot target environment.OK, that’s enough talk, let’s get started! My machine is an ordinary Intel 386 32-bit dual core running Windows XP.
works on a static singleton instance of whilst works on a thread local instance of which is a subclass of . introduces the overhead of variable look up on each call to the -method. Considering what I’ve just said, then it’s really a little surprising that it’s twice as fast as in a single thread, isn’t it? I didn’t expect such a significant difference.
Again, I am using a tiny micro-benchmarking framework presented in one of Heinz blogs. The framework that Heinz developed takes care of several challenges in benchmarking Java programs on modern JVMs. These challenges include: warm-up, garbage collection, accuracy of Javas time API, verification of test accuracy and so forth.
Here are my runnable benchmark classes:
Let’s run the benchmark using Heinz’ framework:
Notice: To make sure the JVM does not identify the code as “dead code” I return a field variable and print out the result of my benchmarking immediately. That’s why my runnable classes implement an interface called RunnableBenchmark. I am running this benchmark three times. The first run is in default mode, with inlining and JIT optimization enabled:
Then again without JIT optimization (VM option
):The last test is with JIT optimization, but with
which (almost) disables inlining:Let’s interpret the results carefully: With full JVM JIT optimization the other otimization techniques.
is twice as fast as . Turning JIT optimization off shows that the two perform equally good (bad) then. Method inlining seems to make 30% of the performance difference. The other differences may be due toOne reason why the JIT compiler can tune
more effectively is the improved implementation of .The first snippet shows
which is used intensively in the benchmark of . Compared to the method requires significantly more instructions, although both methods do the same thing. In the class the variable stores a global shared state to all threads, it changes with every call to the -method. Therefore is required to safely access and change the value in calls to . on the other hand is – well – thread local :-) The -method does not have to be thread safe and can use an ordinary variable as seed value.About method inlining and
One very effective JIT optimization is method inlining. In hot paths executed frequently the hotspot compiler decides to inline the code of called methods (child method) into the callers method (parent method). “Inlining has important benefits. It dramatically reduces the dynamic frequency of method invocations, which saves the time needed to perform those method invocations. But even more importantly, inlining produces much larger blocks of code for the optimizer to work on. This creates a situation that significantly increases the effectiveness of traditional compiler optimizations, overcoming a major obstacle to increased Java programming language performance.”
Since Java 7 you can monitor method inlining by using diagnostic JVM options. Running the code with ‘
‘ will show the inlining efforts of the JIT compiler. Here are the relevant sections of the output for benchmark:The JIT compiler cannot inline the
method that is called in . This is the inlining output of :Due to the fact that the
-method is shorter (31 bytes) it can be inlined. Because the -method is called intensively in both benchmarks this log suggests that method inlining may be one reason why performs significantly faster.To verify that and to find out more it is required to deep dive into assembly code. With Java 7 JDKs it is possible to print out assembly code into the console. See here on how to enable VM Option. The option will print out the JIT optimized code, that means you can see the code the JVM actually executes. I have copied the relevant assembly code into the links below.
Assembly code of ThreadLocalRandomGenerator.run() here.
Assembly code of MathRandomGenerator.run() here.
Assembly code of Random.next() called by Math.random() here.
Assembly code is machine-specific and low level code, it’s more complicated to read then bytecode. Let’s try to verify that method inlining has a relevant effect on performance in my benchmarks and: are there other obvious differences how the JIT compiler treats and ()? In there is no procedure call to any of the subroutines like or . There is only one virtual (hence expensive) method call to visible (see line 35 in assembly). All the other code is inlined into . In the case of there are two virtual method calls to (see block B4 line 204 ff. in the assembly code of ). This fact confirms our suspicion that method inlining is one important root cause for the performance difference. Further more, due to synchronization hassle, there are considerably more (and some expensive!) assembly instructions required in which is also counterproductive in terms of execution speed.
Understanding the overhead of the
instructionSo why is (virtual) method invocation expensive and method inlining so effective? The pointer of Invoking Methods and Linking in the JVM Spec for details)
instructions is not an offset of a concrete method in a class instance. The compiler does not know the internal layout of a class instance. Instead, it generates symbolic references to the methods of an instance, which are stored in the runtime constant pool. Those runtime constant pool items are resolved at run time to determine the actual method location. This dynamic (run-time) binding requires verification, preparation and resolution which can considerably effect performance. (seeThat’s all for now. The disclaimer: Of course, the list of topics you need to understand to solve performance riddles is endless. There is a lot more to understand then micro-benchmarking, JIT optimization, method inlining, java byte code, assemby language and so forth. Also, there are lot more root causes for performance differences then just virtual method calls or expensive thread synchronization instructions. However, I think the topics I have introduced are a good start into such deep diving stuff. Looking forward to critical and enjoyable comments!
References: “Java 7: How to write really fast Java code” from our JCG partner Niklas.
출처: http://www.javacodegeeks.com/2012/01/java-7-how-to-write-really-fast-java.html?utm_content=buffer4acc4&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer