After running some more performance tests with the Starling demo’s benchmark it’s time to write once more about the results from the optimizations and the performance in general. Something that I didn’t originally mention in the earlier posts was that I ran all the tests with Flash player 11 projector and not under web browser plugin. This has a huge effect on the performance like this article will show. Also in the tests for my previous posts my laptop was not running at the full speed since some energy saving features were still on. Some of the original numbers were affected a little but I’ll fix those to the previous posts too.
So let’s now go to the latest tests results. Five different versions of the benchmark were run on four different computers. First the original benchmark using CPU rendering, then the original benchmark with Sprite flattening using CPU rendering, then the optimized Sprite flatten benchmark using CPU rendering (see “Optimizing Starling framework”), then the original benchmark using GPU rendering and finally the optimized benchmark using GPU rendering (see “Passing the Starling Image count limit”). On one computer I also tested the effect from using different editions of the Flash 11 player .The Starling demo’s benchmark was modified to run at 30 fps like in my previous posts.
This chart shows the amount of moving Images at 30 fps with four different computers. Both the CPU and GPU optimizations improve the performance considerably.
The comparison between different computers and Flash player editions shows few interesting things.
First interesting issue is when you check the results for CPU rendering with the slowest computer (Athlon 64 3000+) marked with blue color. There the optimization for the CPU rendering gets only 1.4 times the original performance. This is because of the big image on the background of the benchmark scene. Even if it stays still it needs to be rendered on every frame and with slower processors and CPU rendering that image alone requires quite some time to get drawn.
Second interesting issue is that GPU really does make a difference with Flash 11. When the five year old desktop computer with the slowest CPU starts using its GPU (which in fact is not that good) for the rendering it beats the laptop that has no GPU by a huge margin (2500 vs. 570 images) and even gets better results than the fastest computer with CPU rendering (2500 vs. 1970 images).
Third interesting issue is that there is quite a big performance difference between different Flash player editions. The release projector player is naturally fastest with all the different benchmarks but the order of the release plugin player and the debug projector player depends on the benchmark run. I have marked the two interesting cases with red text. First one is the “CPU flatten” with the debug projector player. Here the debug player performs really poorly. This is most likely caused by the fact that the original implementation of Sprite’s flatten function creates lots of new instances of different classes and the debug player needs to keep track of these. Second interesting case is the original “GPU” rendering benchmark with plugin player. It handles less than half of the optimized GPU rendering benchmark indicating that vertex buffer access has some serious overhead under plugin players. All in all the performance under plugin player seems to be within 60-80% of the performance of the projector player. I am hoping that with new player versions the difference would not be this big.
To wrap this all up again one thing to understand is that now with GPU rendering support the variation in performance between different systems can be just massive. In my tests the fastest computer was able to handle around 20 times as many moving images as the slowest one. This means that when you are implementing any Flash 11 game there really should be possibility to adjust the graphical detail to keep the game running smoothly on slower machines with possibly no GPU and also to give some extra visual effects for the users who have fast machines with state of the art GPUs. Also worth noticing is that when targeting web browsers the tuning done in “Passing the Starling Image count limit” more than doubles the GPU rendering and when using CPU for rendering it’s really crucial to have the optimization done in “Optimizing Starling framework” in place.
That’s all this time. Next post will probably be about handling device loss in Starling.