Starling gets wings

The new version 0.9.1 of Starling framework is now available for download and I have to say that Daniel has done excellent job rewriting some of the crucial parts of the rendering code. Now the CPU rendering is running little faster than the old Starling with all the optimizations from my previous posts but the GPU rendering has improved even more. On my laptop the new Starling framework can handle 35% more images than the old version with my optimizations. The exact performance depends on the GPU but I would say that the new approach is definitely the way to go with GPU rendering. But even if the version 0.9.1 brings in serious performance enhancements it’s still possible to make it a tiny bit faster with relatively small changes.

The biggest change in the new Starling framework was that all the Image and Quad rendering is now done in batches. This means that during the rendering up to 8192 consecutive Images using same base texture and rendering options are combined into same vertex buffer and then rendered with single drawTriangles call. The idea is the same used in the flatten function earlier but now it’s done automatically behind the scenes. This means that on every frame there are some pretty heavy matrix calculations required to generate the vertex buffer data. So the first place we start tweaking is the VertexData class.

For every Image and Quad their VertexData is first copied in the addQuad function of QuadBatch with VertexData’s copyTo function and then matrix transformed with transformQuad function. Within the transformQuad function the position values are again copied back and forth between Vectors. To make these two operations more optimal the mRawData Vector in VertexData needs to be divided into two – one Vector containing the position values and another containing the texture and color values. The transformation matrix is added as a parameter to copyTo function and the whole transformQuad function is dropped. In the copyTo function the now separate position value Vector can be given as an input to Matrix3D’s transformVectors function. Then the transformed values are copied from the sPositions Vector into the target VertexData instance’s position value Vector. With this change we get rid of two unnecessary Vector copying rounds.

After dividing the data in VertexData into position data and texture and color data the latter should be converted into ByteArray instead of Vector. Uploading ByteArray data into vertex buffer is a lot faster and since this color and texture data is probably not changing too often the time it takes to set the values is not that important. Switching from Vector to ByteArray will require changes in so many functions that I won’t go through them but few things to remember are to make the ByteArray’s endian to be Endian.LITTLE_ENDIAN, set the length and position always correctly and use writeFloat and readFloat functions to read and write the ByteArray.

The changes in VertexData class need to be handled also in QuadBatch, Image and Quad classes. In QuadBatch two vertex buffers are now required – one for the position data and another for the texture and color data. Image and Quad need to have the transformation matrix as a parameter in their copyVertexDataTo functions.

Other places for small optimizations are the isStageChange function in QuadBatch that can also be optimized since checking first if there are no quads is definitely not the most optimal solution. RenderSupport on the other hand uses get currentQuadPacth function a lot so a minor tweak here is to store the current QuadBatch into separate variable and use that instead.

After these changes you can expect about 5-10% better results with the Starling demo when using GPU rendering. With CPU rendering the improvement is a lot smaller.

Since with this version of Starling the amount of Images on screen is getting really high the event handling also needs to be really optimized. One place to gain performance from is the Stage’s advanceTime function. On every frame update it calls dispatchEventOnChildren function with enter frame event. The dispatchEventOnChildren will go through all the possibly over ten thousand child display objects and check if they are listening to this event. The changes are that only one display object in your application is actually interested in this event so this is really inefficient. Quick fix for this is to require that only display objects that are direct children of Stage can receive the enter frame event. This way you can limit the check to the root display object in the advanceTime function. This change should improve the Starling demo performance by another 5-10% when using GPU rendering. Another thing to notice is that even if the amount of rendered Images may rise in total by about 10-15% with GPU rendering also the CPU load will drop significantly. This means that there is more time per frame for your actual application logic.

The speed comparisons between original Starling, original Starling with my optimizations from the previous posts and the new Starling are shown in the chart below. The results with the new Starling framework are on the bottom four rows. [Edit: The test were run at 30 fps]

One interesting thing is that at least on Windows 7 the 64bit Internet Explorer 9 Flash plugin plays the Stage3D content even faster than the Flash projector. Firefox on the other hand performs really poorly achieving only about half of Internet Explorer’s performance. This means that at the moment if you are developing a top notch Flash 11 application for web you probably should recommend the users to avoid using Firefox.

To wrap things up Starling 0.9.1 is definitely an update everyone has been waiting for. It gives really good performance boost compared to the previous version and if the device loss handling is also added hopefully in the next version then there’s basically no reason not to use Starling for 2D rendering when you are developing a Flash 11 application.

By villekoskela, on December 12, 2011 at 10:23 pm, under ActionScript 3. 30 Comments

Post a comment or leave a trackback: Trackback URL.

Comments

Mike - Lime Rocket On December 14, 2011 at 9:51 am
Permalink | Reply

Awesome writeup mate, Im interested to see the effect of these new changes for AIR mobile builds when stage3d is turned on.
Redoc On December 20, 2011 at 5:59 pm
Permalink | Reply

Nice the new Starling update is definitely a performance boost but there is still a lot of room for improvement when comparing to other engines.. I was able to render 7000 sprites using the latest Starling, 14000 using ND2D and 51000 using Genome2D at 60FPS.
- villekoskela On December 20, 2011 at 7:24 pm
  Permalink | Reply
  
  I just tried Genome2D and at least on my laptop the sample application didn’t handle that much more sprites than the Starling demo. I’ll need to check it more carefully to be able to post some performance comparisons.
  - Redoc On December 20, 2011 at 8:25 pm
    Permalink | Reply
    
    Aren’t you using Mac by any chance? Because benchmarks on Macs are all over the place for some reason atleast for me. It also seem that on Macs the actual reported framerate is not what i see on screen. It says 60FPS in some demos but its nowhere near 60FPS being jerky and all, this never happens on PC.
    
    Also depends heavily on exact Flash player version since it seems that they are optimizing the GPU stuff performance more and more. And where Starling has heavily CPU based pipeline ND2D and Genome2D seem to have GPU bottleneck and therefore they gain from these updates more. Extrapolating from this i would also say that they will perform better on mobile but who knows.
    
    All of the frameworks are great and have their pros and cons, and at the end we the users are the real winners because we have options to choose from 😉
  - Redoc On December 20, 2011 at 9:17 pm
    Permalink | Reply
    
    Ok I rerun the tests to be absolutely sure and here are the approximate numbers, all of them latest versions from github and all of these sprites are moving each frame.
    
    Starling: 7000 sprites
    ND2D: 13500 sprites
    Genome2D: 34000 sprites
    
    If you disable the movement it jumps way higher.
    
    The 51000 number i got before was probably using the Genome2D blit() method, i am able to render 53000 sprites at 60FPS that way now.
  - villekoskela On December 20, 2011 at 9:56 pm
    Permalink | Reply
    
    Thanks for your numbers. My laptop is a PC with i5-480M CPU, GeForce GT 415M GPU and Windows 7. Like you said it’s good to have competing frameworks available.
  - Redoc On December 20, 2011 at 10:11 pm
    Permalink | Reply
    
    Oh my bad i am on PC as well i5-650 CPU, GeForce GTX470 which is way better GPU but the difference is not there for some reason. Daniel mentioned something on the forum about PC performance being not up there for some reason yet. Strange thing is that you have PC as well and according to your Starling benchmark you were able to pull better numbers.
  - villekoskela On December 20, 2011 at 10:19 pm
    Permalink | Reply
    
    I run my tests at only 30 fps. I added that as a comment to the post since even if I had mentioned that in the earlier posts it was missing from this one.
  - Redoc On December 20, 2011 at 10:32 pm
    Permalink | Reply
    
    Oh makes sense then, seems to point out to the CPU bottleneck as i mentioned since the numbers scale by the CPU speed not GPU.

Trackbacks

By Cool Stuff with the Flash Platform – 12/16/2011 | Android Developers on December 17, 2011 at 4:01 am

[…] Koskela discusses the latest release of the Starling framework, version 0.9.1, which includes a number of important performance optimizations. Ville notes that in […]
By 2011年十个值得收藏的Flash博客 | Flash开发者大会 on January 6, 2012 at 4:52 am

[…] Starling gets wings […]
By Some Stage3D thoughts - ByteArray.org on January 20, 2012 at 9:29 am

[…] without waiting for a new release of Flash Player and AIR. Recently, Ville Koskela from Rovio, shared his excitement regarding Starling and its performance boost. Today, over 50% of users with Flash […]
By 介绍Flash Player 11.2和AIR 3.2 beta4 | Flash开发者大会 on January 30, 2012 at 2:23 pm

[…] 你可以立刻利用这些性能增强来形成优势，而不需要等待新版的Flash Player和AIR。最近，来自Rovio的Ville Koskela兴奋地分享了关于Starling和它的性能提升。 […]
By 2011年十个值得收藏的Flash博客 | 嘟嘟晨的个人站点 on April 11, 2012 at 4:44 pm

[…] Starling gets wings […]
By Mymarketing » Interview: Bringing Angry Birds To Facebook on February 22, 2013 at 4:07 pm

[…] th&#1077r&#1077 f&#959r myself, &#1072b&#959&#965t wh&#1110&#1089h I &#1072l&#1109&#959 wrote &#959n m&#1091 blog. Now th&#1077&#1109&#1077 optimizations (&#1072nd &#1072l&#1109&#959 many others) […]
By Today’s Links | JohnAspinall.co.uk on February 22, 2013 at 6:13 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook - heromind - High quality design resources for graphics and web designer in 2013 on February 22, 2013 at 8:09 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | Photography in Australia on February 22, 2013 at 8:35 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook on February 22, 2013 at 9:31 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | FloroGraphics.com on February 22, 2013 at 9:51 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook - rehavaPress on February 23, 2013 at 1:12 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | DigitalMofo on February 23, 2013 at 1:29 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | Diancin Designs on February 23, 2013 at 2:56 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook - Pittsburgh Web Design & Hosting on February 23, 2013 at 7:10 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook - Steve deGuzman on February 23, 2013 at 2:18 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By cristianoana – Bringing Angry Birds To Facebook on February 23, 2013 at 4:55 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | Fescu.com | TechBlog on February 27, 2013 at 7:45 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview: Bringing Angry Birds To Facebook | Wordpress, joomla and other web-development news on March 3, 2013 at 11:33 pm

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Interview — Bringing Angry Birds To Facebook | Smashing Magazine on June 29, 2013 at 7:35 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]
By Bringing Angry Birds To Facebook on March 7, 2014 at 9:44 am

[…] performance was not too good, so I did lots of optimizations there myself, about which I also wrote on my blog. Now these optimizations (and also many others) have found their way into the current version of […]

villekoskela