EDIT: The metrics below are skewed. Not all JVM’s are created equal. For updated metrics on the Pi, please see JVM Choice Matters (a lot!).
What better thing to do on Pi Day than to calculate Pi? I figured while I’m at it, I might as well do some performance testing of the Raspberry Pi 2 B. I’m using the following for my comparisons:
Host | Description | Bits |
---|---|---|
Pi 2B | Quad Core, 900 MHz ARM 7, 1G ram | 32 |
Pi B+ | Single core, 700 MHz ARM 6, 512M ram | 32 |
Argentum | Quad core AMD A10-5757M at 1400MHz (variable clocking, can be higher), 12G ram | 64 |
Nimbus | Eight core AMD FX-8320 running at 3515MHz, 32G ram | 64 |
I’m using a variant of the code at Calculate Pi Inefficiently — I’ve edited it to allow to specify the number of terms on the commandline:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
// PiSlow: pi from slowly converging series public class PiSlow { public static void main(String[] args) { double sum = 0.0; // final sum double term; // term without sign double sign = 1.0; // sign on each term int N = Integer.parseInt(args[0]); // number of terms System.out.println("Calculating " + N + " terms"); for (int k = 0; k < N; k++) { term = 1.0/(2.0*k + 1.0); sum = sum + sign*term; sign = -sign; } System.out.println("Final pi/4 (approx., " + N + " terms): " + sum); System.out.println("Actual pi/4: " + Math.PI/4.0); System.out.println("Final pi (approx., " + N + " terms): " + sum*4.0); System.out.println("Actual pi: " + Math.PI); } } |
I am using the following script to calculate:
1 2 3 4 5 6 |
for i in 1000 10000 100000 1000000 10000000 10000000 100000000 do time java -XX:+PrintGC -Xloggc:gclog$i PiSlow $i echo "" done |
I’m running the script seven (7) times in order to let the host warm up some.
I don’t expect there to be any garbage collections, but just in case ;-). One difference between the original version and the current is the IO — the version I’m running here prints a lot less (and runs much more quickly as a result).
Results (Real time Median in seconds)
Number of Terms | Argentum | Nimbus | Pi 2B | Pi B+ |
---|---|---|---|---|
1000 | 0.124 | 0.075 | 0.706 | 1.42 |
10000 | 0.131 | 0.075 | 0.729 | 1.426 |
100000 | 0.144 | 0.081 | 0.664 | 1.512 |
1000000 | 0.143 | 0.088 | 0.726 | 2.195 |
10000000 | 0.1995 | 0.1145 | 1.5165 | 9.217 |
100000000 | 0.5355000000000001 | 0.3835 | 8.2345 | 79.5255 |
1000000000 | 3.8884999999999996 | 3.062 | 74.68549999999999 | 781.9705 |
Interpreting the results, I observe the following
- It looks as though the majority of the time for terms up to 1000000 is the JVM initialization. Around 100000000 the 64 bit hosts start to be more efficient — they do not have the huge jump in elapsed time.
- Garbage Collection is not invoked
- Since the code is single threaded, the extra cores shouldn’t make much of a difference.
- The Pi 2B is running Oracle Java 1.8 from the Raspbian distribution; the other three are running OpenJDK 1.7.0_75. I may re-run on Pi B+ using Oracle Java 1.8; I halfway expect an improvement in performance there. Until 10000000, the Pi 2 is running at ~2-3x the performance of the Pi B+. After that it goes to ~9x performance.
- I think that the HotSpot compiler is kicking in around 100000000 which could explain the radical divergence in runtime.
- I think that the floating point support for the ARM is not as good as for AMD64 — additionally ARM 6 and ARM 7 have different floating point capabilities.
This was a fun little diversion for Pi Day!
1 ping
JVM Choice Matters (a lot!) » Ramblings
November 30, 2015 at 8:00 am (UTC -5) Link to this comment
[…] using the same code as I did for Pi on a Pi for Pi Day — it’s single threaded and does not need a garbage collection. These two facts remove a […]