Pages

Monday, June 4, 2012

Optimizing Realtime Games on Android

This post was originally intended as just another topic in our Lessons Learned series. But we have spent quite a lot of time optimizing our game in the past 11 months. So we noticed we have enough material to cover here to justify a whole post on game optimizations. I’m telling you this because it will shed light on what we will present in the following post.

This post is not supposed to be an exhaustive list on Android game optimizing. Some of these topics might not apply to your situation at all. But these are the topics that we feel to be important based on our experience in creating SCAWAR Space Combat.

Choose your battles

Optimizing code often compromises readability and maintainability of the code. It also takes a lot of time and effort. Time that could be used to create more content to your game and implementing those must-have features before your first release.

Only a very small portion of game code is really performance critical. For the rest of the code you should focus on making it readable and not worry about squeezing every bit of performance out of everything.

Only optimize code when it is really necessary and even then, choose the parts to optimize with careful measuring so you don’t waste effort where it is not needed.

Measure before Optimizing

Android ships with a debugging tool called the Dalvik Debug Monitor Server (DDMS), which provides port-forwarding services, screen capture on the device, thread and heap information on the device, logcat, processes and more. What proved to be most valuable for us was the Object Allocation Tracker. You would start your game, open DDMS, select your game from the process list, click Start Allocation Tracker, play your game, pause your game and then ask DDMS for a list of the allocated objects. This shows you what objects were created and who created them. If your garbage collector screams like a bottle rocket on steroids when you play your game, the Allocation Tracker is a good tool to start searching for the culprit causing all those unnecessary object allocations. You should avoid all object creations that you can and pre-create all the rest.

When your object allocations are in check, you can fire up traceview for measuring the execution performance of different parts of your program. You need to start and stop the trace gathering programmatically. We did this by overriding the phone’s ‘back’ button to either start or stop tracing. So a typical measuring goes something like this:

  1. Start your game
  2. Start playing a level
  3. Click the ‘back’ button to start method tracing
  4. Play for 10 seconds or so
  5. Click the ‘back’ button again to stop tracing
  6. A file with .trace extension is created to your /data
  7. Open the trace file with traceview

This will show you an overview of everything that goes on while you play. The items are ordered by inclusive execution time by default, so the method that takes the longest (including everything else called from that method) is placed on top of the list.

Usually, you will scan the list for something starting with your game’s own package name. If you then see com.mycompany.mygame.Game.justScratchingMyArse() method taking 30% of all execution time, you probably should do something about it. You can click on that method and you’ll see what exactly is taking that time inside the method call. Then you just keep drilling down the list in this way until you find something that you can act upon.

Especially with Scala, Proguard and Dalvik JVM doing tricks to your code, you should always make this kind of exploratory measurements before doing any optimizations. And then make the measurements again to see if you did something great or actually just made it worse.

Cheat if you can

Fortunately we are dealing with entertainment here, not science. Only the results matter, not the way we get there. So on a CPU and memory limited device, you cheat where you can. Here’s a few ideas on where you can skip some corners.

Checking collisions, updating timers that control game mechanics, moving dots on a radar and so on. These are things that are usually done in update methods. Update methods usually run on each frame. But do you really need these calculated / executed on each frame? How about every second frame? Or every third? There’s a lot of CPU power to gain here so experiment with these.

What’s not on screen, should be handled as lightly as possible. In our game, if an enemy explodes far from the sight of the camera, we just play the sound and skip the sprite animations, effects and particle engines we normally create our explosions with.

Collision Handling and Space Partitioning

Collision handling is one of the most computation-heavy activities in our game. If you have played it, you know why. The screen is full of bullets, enemies, mines, black holes, etc. So we had to do a lot of work with the algorithm.

Due to garbage collection, we had to rewrite our beautiful 10 lines of functional programming to something like 200 lines of while loops. It was a glum moment for us all. But, unfortunately, filter and map operations create way too many temporary objects and collections to be effective here.

The game engine we used offered collision detection, but it was still a slow process to compare collisions between every bullet and every enemy on every frame. Fortunately a technique called space partitioning came to our rescue. You could really go crazy here with complex data structures and tracking algorithms. But the basic idea is to split the game area into smaller partitions. Then you just do collision detections to objects inside the same partition. You can have a screen full of enemies, but if you are shooting in just one direction, most of your space partitions will be empty of bullets. So you start your collision handling by selecting a partition and seeing if there are any bullets or enemies inside it. If either one of these is missing, you can completely skip collisions for that partition.

We didn’t want to go overboard with this, so our poor man’s space partitioning did just a simple vertical slicing of the game area. But already this brought us massive savings in CPU time.

Functional Operations and Garbage Collection

Garbage Collection is your worst enemy in real time games. Every time garbage collection occurs it hoards the CPU time from your other threads. When you have lots of discarded objects to clean, GC occurs more often and tries to spend as much CPU time as possible. This makes your game lag and have poor FPS during gameplay. Here are some tricks to avoid unnecessary discarded objects.

Reuse your variables as much as you can.

Try to generate as few new objects (sprites, game objects, etc. ) as possible during game play and never discard them. If you must discard something, choose the objects carefully and try to discard them in a phase where lag is not noticeable, ie. in Menus, Scene switches, Between levels, etc. 

Methods

When you create methods, try to avoid creating variables. Here’s an exaggerated example of bad design:

1 def follow() { 2 val direction = getDirection 3 val xVelocity = velocity * FastMath.sin(direction) 4 val yVelocity = -velocity * FastMath.cos(direction) 5 setVelocity(xVelocity,yVelocity) 6 }

Now think of situation where you have objects using this method in every frame. If there are 30 objects in every frame (30 frames/sec) and the method creates 3 discarded objects (val direction, ...), you end up with 1200 discarded objects every second and this will drive the GC mad.

To make fewer GC operations, you could make the variables into class variables or put them directly to the setVelocity method call. The latter is not pretty but will do the job.

1 def follow() { 2 direction = getDirection 3 setVelocity(velocity * FastMath.sin(direction), 4 -velocity * FastMath.cos(direction)) 5 }


Pool everything

Besides the abovementioned point of Garbage Collection being your worst enemy, object creation is also slow on Android. Both of these points guide you to create everything you need in advance, not during the game. What you do is create pools for everything. If you can, create fixed-size pools so there will be even less garbage. Create your enemies, bullets, explosions, modifiers, timers, etc. all in advance, and put them in object pools. During the game play, when you need a new object, request it from the pool. The pool will return an object for you instead of creating one, and either remove it from the pool or mark the object in the pool as being in use until you return it.

Random number generators are not made equal

Random numbers are really common in game programming and it’s very important to choose the proper Random generator for your purposes. Do not use java.util.Random, it’s not random enough and it’s too slow. What we suggest is to use either MersenneTwisterRNG, which is faster than java.util.Random, or XORShiftRNG, which is twice as fast as the MersenneTwisterRNG. These can be found at http://maths.uncommons.org/.

Another good question is: do you really need real random numbers? Here’s a simple example of how to get a random spawn point from predefined point collection.

1 val rand = scala.util.Random 2 val spawnObjectWidth=10f 3 val xPoints = (0 to 10).map(item => item*spawnObjectWidth).toArray 4 5 def getRandomXPoint:Float={ 6 xPoints(rand.nextInt(10)) 7 }

Nothing special here, but when used in spawning this could return the same point twice when spawning multiple objects to the screen. Also, as was stated above, random number generation is a slow operation. When you need plenty of random numbers, pre-generate them! Memory is cheap, CPU time on the other hand is not!

Here’s another example how to handle random spawn points without using random number generation. Since we are spawning to predefined points, we can make the spawn points random instead of getting a point from a random location of the list.

1 val spawnObjectWidth=10f 2 val xPoints = (0 to 10).map(item => item*spawnObjectWidth) 3 val randomXPoints = scala.util.Random.shuffle(xPoints).toArray 4 val randomXPointsSize = randomXPoints.size 5 var index = 0 6 def getRandomXPoint:Float={ 7 index = (index + 1)%randomXPointsSize 8 randomXPoints(index) 9 }

Looks horrible, but it does the trick. With this, there can’t be two objects in the same location when spawning the objects to the screen, unless you spawn more objects than there are spawn points in the array. The second point here is that it’s random enough.

Mathematical operations

In almost every game, you will need to do some mathematical operations, for example calculate a new position of an object (bullet) which is relative to its parent object.

1 def shift(shift: Point, direction: Float) = { 2 Point(this.x + shift.x * math.sin(direction), this.y - shift.y * math.cos(direction)); 3 }

math.sin and math.cos are mathematically accurate calculations, but they are too slow when you have to use them a lot. What you could do is pre-calculate your mathematical functions to a table and use these pre-calculated results. The values in the table are not mathematically accurate, but yet again you are not doing science but cheating your way to better performance.

Loops and Conditions

Containers
Selecting the best containers for your game is very important. You have to decide how you are going to access the data and what you are doing with the data in the container. If you’ll need to access data in the container with an index, your obvious choice is to use Arrays, but if you need a container with mutable size, using a linked list instead of an array is a much wiser choice. 

Loops
If you are using a functional language like we did, do not use ‘foreach’ in performance critical places. Foreach is often many times slower compared to a ‘while’ loop, and for that reason your optimal choice is to use ‘while’ as much as you can.

If you are iterating your container and only need a certain object from the list, try to make the loop short-circuit as fast as possible. Or if it suits your situation, instead of looping, use another data structure, for example a set or a map. Try to save time where you can.

Proguard switches

Proguard contains plenty of peephole optimizations for arithmetic operations, like turning divisions into multiplications, multiplications into bit-shifts, and so on. In performance critical loops, these optimizations can make all the difference.

Most examples of Proguard configurations you see on the internet, including Proguards own examples for Android, contain the following line:

-optimizations !code/simplification/arithmetic

This will tell Proguard not to use these arithmetic optimizations. And it’s because Dalvik, the Android JVM, had problems with these optimizations on Android 1.0 and Android 1.5. But they haven’t been a problem after that. Currently there are basically no Android 1.0 devices and 1.5 devices have a 0,3% market share. So if you are not developing for 1.5, remove the above line from your Proguard configuration and let it do its own optimizations.