Saturday, March 4, 2023

After-Thoughts: Optimization Do's and Don'ts in Game Programming

 Okay so let's say you're someone who's interested in Game Design. You have the best idea in the world that you think will make a lot of money while also making a lot of players very happy. Now all you need to do is select your tools and begin building. But you notice a few problems. Specifically that your optimization isn't going well. What do you do? Well, here are some tips.

Naturally this is going to be a long one but I hope you guys find it interesting.

So let's start with some basics. What you should actually do while you're in your Engine of choice. I'm going to make this generalized so that you can follow along with pretty much any Engine you're using but I am most familiar with Unreal Engine with some basic understanding of Unity so keep that in mind.

Use Tick/Update Sparingly

This function has a different name depending on what Engine you're using, in UE4 and 5 it's called Tick, in Unity it's called Update but regardless of what it's called, I think most Engines have some variation of this. That being a function or module that is called every frame. Now most engines are Object Oriented and this means that each individual piece of code you build can be divided into their own separate Module or Function. For those who don't know the difference, a Function is something that returns a value, usually one that can be plugged into a variable whereas a Module is just something that performs instructions. Think of it like in Java where you have functions that start with a variable type like an int or a bool, and then you have functions that start with void. Voids are the only functions in Java that don't return anything but they still do stuff. That's essentially the difference between a function and a module on a general level.

This can have several benefits, such as building your code in pieces so that if one of them isn't working correctly, all you need to do is fix the broken piece or the broken interaction rather than scouring the entire program for what is and is not working, although you will still probably have to do that. However, the main benefit in this case is that from an optimization standpoint, you can divide up pieces of code across different modules and then the game will only call the specific modules that need to be called.

Put another way, any function that exists within a singular Actor will only be called while that actor is actually in the game world. Any functions that are specific to it will not be called if it's not there. Same applies to Level specific Code when Level Streaming in particular.

Tick or Update is very different in that while variations of it may be Object specific the function itself isn't called in specific circumstances like upon collision, upon overlap, or upon spawn, this function is called every frame. To give you an idea of how taxing this can be on your machine, if you put a piece of code onto Event Tick, it will be called every time a frame is drawn. So if your game runs at 60 FPS it'll be called 60 times per second. However, if what you have on Tick is so heavy that your machine can't handle it at 60 FPS, it'll simply reduce the framerate so that it can keep up. This is one way in which a game that should be able to handle framerates can end up falling short tremendously.

So if you want to avoid that, Step 1 is to put any code that doesn't need to be called every Frame onto an event or function that is only called when necessary. An additional step for any function that needs to be called every frame when relevant but not all the time is to place a logic gate onto that specific piece of code so that it can be disabled when it isn't relevant.

For example try having many function specific pieces of code placed into their own modules, then you can place those modules onto Tick, and then have a logic gate at the very beginning of that module check to see if it's currently relevant, and if it isn't, disable it. That way the function is still called, it just won't do anything. Which isn't the best situation to be in but it's still preferable if there's no other way around it.

Optimize Your Variable Types

Okay so this'll be in two categories but they're both the same problem. When programming in an Object Oriented manner, there is an inclination to make variables for every single function you want to use. And there are times to use variables. There may be a situation where you want to check for a state or substate so that things run as intended, especially if you have an input buffer. Putting that onto a variable that can just be checked when necessary is a good thing to have.

However, not everything needs to be a variable. For example, if you have a value that is specific to a function rather than general to an object, don't make it a variable, make it a parameter. That way the value is only checked when the function is called. Then you can set the parameter upon call using the value itself (known in programming as a literal) rather than the variable itself. Literals are very valuable in situations where you need to cut down on performance because they're just the values by themselves without storage. They're a computation that doesn't consume memory once the computation is over. The reason to use a variable is if the value needs to be stored. Parameters offer a good mix of the two.

However, let's not ignore the fact that there will be many times where the values needs to be stored. Either because it's being accessed constantly or because it's universal enough that it's better to be stored as a singular value rather than parameterized or removed from memory once computed. If it only needs to be referenced once then remove it to make space for other things. However, if it's going to be referenced constantly, it's better to keep it in memory so that you're not wasting instructions on constantly bringing it in, removing it, and so on. It's better to just keep it and let it do its thing if that's the case.

But that doesn't mean all variable types are created equally. Variable types will vary depending on the Engine but some are fairly universal. Let's go over a few very quickly.

Boolean: A boolean is usually registered as True-False value in presentation but what it actually is can be described as a single bit, a 1 and 0, on or off. It only has two states because of this. Either it's a 0, false, off, etc, or it's a 1, on, true, etc. It doesn't consume too much memory or computations because it's only the size of a single bit. It's the smallest value that can be accommodated.

Byte: Bytes are whole number values that are 8-bits large. 8-bits means that it has 8 switches that can be either on or off. In Binary Code, you could compute it as a minimum of 00000000 or a maximum of 11111111, with the first value being equal to 0 and the second value being equal to 255. Naturally due to the binary computation though, it can handle any value in between those two. With 01000000 being equal to 2, 10100000 being equal to 5 and 00000001 being equal to 128. Because every successive bit is double the max value of the previous value. I'll talk more about this in the actual tips portion.

Integer: This is a standard one as well, a 16-bit whole number. However, it's different from the Byte in one more meaningful way. That being that a Byte cannot go below zero. Any attempt to will cause overflow. Integers, however, can have negative values, meaning that the full range of the number being double the bits in theory actually has a total range of the maximum value and the negative version of that maximum value in practice. You can increase the maximum range by setting it to unsigned, which will prevent negative numbers allowing a higher cap without increasing the bit-size but that's the final meaningful distinction.

Integer-64: Not every engine accommodates this variable type but I will bring it up for comprehension's sake. An Integer-64 is almost identical to an Integer except rather than a 16-bit value it is a 64-bit value. The difference is essentially 2^64 - 2^16. Other than that, it's functionally identical to the Integer type.

Floating Point/Float: This is where things start to get a little interesting because Floats are 32-bit values but they're unique from integers in that they don't handle whole numbers exclusively. Instead, they also handle decimals. Integers will take a number like 2.5 and cut off the decimal turning it into a 2. It won't round up, it'll truncate. Now, Unreal Engine comes with a round-up function. A round up function is easy enough to build. However, decimals offer some additional utility that Integers can't provide such as framerate counting, percentage calculations, and some other things that can also be done with it. For now, let's move on.

Vector: Vectors are not a universal term but I'll use it for the sake of this. A Vector, in Unreal Engine Terms, is basically a combination of 3 float values that are used for the sake of axis calculations. Very frequently these are used as XYZ calculations with 3D Graphics but they can also be used as RGB calculations since in Texture Coordinate Terms X = R, Y = G, and Z = B.

2D Vector: 2D Vectors are very similar except rather than XYZ they only contain XY. These are universally more common in 2D applications but they are still useful for Texture mapping since an XY can also be used as UV.

4D Vector: 4D Vectors have 4 float values rather than 3. These are usually represented as WXYZ in some programs, though it's more frequently used as RGBA, with the 4th value being Opacity, very useful for Texture Maps of the PNG variety in particular.

Rotators: Rotators are also comprised of 3 float values like Vectors are but vectors are linear values which means they can be used for location and scale just fine. They don't work super well for rotation though so Rotators get their own variable type as it has its own functionality. It usually works on values between 0-359, with anything over overflowing back into 0. This is one of those cases where overflow is a good thing, though. Because if your character is going from 359 to 360 in a singular frame, naturally this means they'll go back to 0. However, you don't want them to rotate all the way back around, you just want them to go back to the starting position the very next frame.

Transform: Transform is comprised of 9 floats in theory. However, in 3D graphics processing, it's usually comprised of a Vector for Location, a Rotator for Rotation, and a Vector for Scale. Because of this it's mildly more complicated than an Array of Vectors, which I'll get to in a moment.

Arrays: Arrays are not a variable type as much as they're a variable variation. To put it simply, an array is a variable of any type that holds multiple disparate values as separate instances rather than having a singular changing value. Put another way, an Array of 9 Integers doesn't hold a single Integer, it holds 9 total separate integers together all at once. This does mean that an array of 9 integers is equal to 9 integer variables in theory. However, in practice, the benefit this offers is that all those disparate integer values are placed very close to each other so that if multiple ones are referenced together in a similar calculation, they can very easily be searched for in more or less the same space. Object Oriented Code already does this by looking within the object it applies to but nothing's stopping you from using this to localize even further.

There are other variable types, such as Characters, Strings, Text, Enumerations, etc. However, those are usually very difficult to optimize since they have very specific purposes. If you're using a Character variable, you're doing so because each individual character needs to be sortable. If you're using a string, it's usually for debugging purposes. If you're using text, it's for presentation purposes and so on.

Every other variable I listed, however, has a lot of overlap with each other so they're much easier to discern when one variable type may be preferable to use over another. So let's discuss.

Logic Gates

A logic gate is any barrier that will only use the following code based on a condition that has been met. Generally there are 3 types of conditions that one may take.

If/Else uses Boolean conditions, it checks for True and False. Switch Case uses Integers, Bytes, Strings, Characters, Enums, etc. And Loops will use Integers or Arrays.

So the question is, which logic gate do you use? Well, generally, there's going to be 1 of 2 answers for If/Else compared to Switches. If/Else only takes two outputs. A true and a false. If you only need to check for 2 outputs anyway, then you'd may as well use an If/Else to make a single check more flexible. However, if a single input needs many potential outputs, it's better to use a Switch. The reason for this is fairly simple: a lot of people will check for whether something is doable by nesting if statements. For example:

If (A == True)

    Do A

Else if (B == True)

    Do B

Else if (C == True)

     Do C

The problem with this in practice is that you're checking the condition multiple times and the checking of the condition will slow you down the more times you check.

However, if you have a Switch Case, it'll be more like:

Switch on Alpha

    Do A

    Do B

    Do C

    Do D

    Do E

In this case what's happening is it's checking the variable Alpha for whether it's equal to any of those values, and because Alpha is one variable with one value, the only one of those things it'll do is the one that is accurate to Alpha. If you want you can even add a Default Option which is what the Switch does when none of the other options are valid.

This may not look faster but it is in practice because it's only checking Alpha once. Whereas in the If/Else statements case it's checking Alpha each time to see whether it equals A then B then C then D then E. The difference is the Switch checks once, where the If/Else checked 5 times or more.

Now of course none of these will take float values. However, if you have a variable to check for multiple different states, try using a whole number variable like an Integer or a Byte so that you can run a Switch case.

As for Loops, well, I generally try to restrict Loop usage to Arrays because that consolidates how much actually needs to be done. So I'll generally use a For Each Loop and then do what needs to be done for whatever spot in the array it's currently at. I don't generally use Loops for anything that isn't specifically for an Array anymore because that can be pretty wasteful if you don't know exactly what range you're working within.

Variable Sizes

Now the thing with Object Oriented Programming is that if you're not careful, the total number of variables you're working with will rapidly go up. However, it is true that you have a minimum number of variables you need for your code to function. However, not every variable is the same size.

You see, because the benefit of a variable is that it stores values for upcoming use, a variable's actual consumption of data is not based on its current value, it's based on the maximum value it can accommodate. So if you make an Integer Variable, the total number of bits that variable uses in RAM is not equal to the number of bits used on its current value up to 16 bits. It uses exactly 16 bits. Because that Integer doesn't know if the number it needs will go up or down from where it is now at any time. As a result it's up to you as the programmer to discern whether the maximum value that variable can accommodate is entirely necessary.

Now of course, there are circumstances where you'll need negative numbers, positive numbers, whole numbers, decimals, big numbers, and small numbers. However, the thing about speed and size is that if you want to push everything close together you'll want to make those variables the same type so that they can be placed inside of an array. So what do you do? Well, for the most part there aren't too many situations where you'll need to make a choice. Obviously if something can be handled by a Boolean, you won't want to change it because that's the smallest size that a variable can get, and any benefit that an array would offer would be offset by the number of bits that array would use. So making an array of Booleans and an Array of Integers is usually going to be a better idea than just making a gigantic array of integers or Booleans. Having a smaller array of both will usually be preferable.

But what about variable types that are a lot more similar? For example, Bytes, Integers, and Integer-64's are all more or less very similar to each other. Now if you're making a Disgaea style SRPG where the numbers have to be possible to make infinite, well, that simply isn't possible but an unsigned Integer-64 is probably as close to infinite as any game would need to accommodate. However, if your maximum number count doesn't need to be that large or numbers aren't being presented to the player at all, then Integer-64's are basically 4 times the bit count without 4 times the benefit.

Alright so that just leaves Integer compared to Byte. Now, if the code is never shown to the user, that just means whether or not you need an Integer or can settle for Byte depends on how big the numbers you need will actually get. For example, if you only need to accommodate numbers 1-10 for a specific function, you don't need a negative and you don't need 16-bits worth of numbers, so a Byte can cover that just fine. If your highest value is 100 and you don't need a negative, use a Byte. If the way you've coded your game does require a negative values, see if you can modify it just slightly so that it uses only whole numbers instead. For example:

If you have a direction check, which uses Dot Products which return any number between 1 and -1, instead of using the raw value that was returned, instead perform a check of sorts. Say for example you have a forward, a back, and a neutral. If it's neutral, return 0, if it's forward, return 1, and if it's back, instead of returning -1, substitute it with a 2 instead. That way you can more easily substitute the Integer for a Byte.

However, if your code absolutely needs negative values, as in something about the way your game works simply requires negative numbers. Say for example you have a monetary sort of game where you have things that you can use to gain currency and things you can use to spend your currency on. But to simplify the code, you only want to do a + operation. This would mean that for things that grant you money, the number would have to be positive but the things that cost money would have to be negative. That way, you can set the general Benefit/Cost to a singular parameter and only have one math calculation per increment. This is a good idea in a sense.

However, instead of using an Integer for that, why not a Float?

Calculation Speeds

Now then let's talk about the difference between an Int and a Float. As I've said previously an Int is a whole number, a Float is a decimal. Ints are 16-bits, floats are 32-bits. So one would think that a float would be less preferable to an Int but that's not necessarily the case. You see, Integers, Integer-64's, and Bytes are really easy to compare to each other because they're all whole number values. However, floats have an interesting history.

You see, once upon a time back when processors were first being made, they added a math portion called the ALU. The CPU is the central processing unit which handles general instructions, and the other half is the ALU which stands for Algorithmic Logic Unit, and it handles math calculations. This includes the calculations for a decimal value such as a float. Now back in the day, the hardware that was used to handle floats was so weak that Float calculations were often very slow. So while the decimals offered benefits, it generally wasn't a high enough benefit to justify using over whole number values which could sometimes be BS'd into Percentage Calculations if Necessary by setting the Max meter to 100 and then setting the damage to a whole number value somewhere between 1-100. Or if it's an RPG with percentage calculations, simply set the meter to itself*100/percentage. I may have done the math wrong on that but the point is that in many cases where percentages were necessary the code may have BS'd it in the back end to simulate percentages but actually just calculating in increments of 100.

However, one thing that happens a lot with innovative technology that is weak but rife with potential is that manufacturers will often overcompensate on increasing its power. This happened with floating point calculations to the point where now, in the current era, decimals actually calculate faster than whole numbers. And while it is true that Floats are double the bit sizes, Bytes are 1/4th the Bit-size on ints. So if you convert as many Ints to Bytes as you do to Floats, and say we have 20 Ints for the sake of argument. That's 20*16 = 320 bits. Half of those as Bytes and half as Floats would be (10*8) + (10*32) = 80 + 320 = 400. Which is quite a large increase, I won't lie. However, consider this:

If you were using that many Ints for the sake of logic gates and other things that are used everywhere, it's possible that a majority of those ints could very easily be turned into Bytes. So if out of 20 Ints you turned certain ratios of Ints into Bytes and Floats, here's the chart of the ratio of Bytes:Floats:Bit-Counts

10:10:400 = 1/2

11:9:376 = 11/20

12:8:352 = 3/5

13:7:328 = 13/20

14:6:304 = 7/10

15:5:280 = 3/4

So all you need to do is make sure that you maintain a 7/10ths ratio of Bytes to Floats and then remove Ints entirely. There are exceptions of course. Again, if you need absurdly high numbers, use Integer-64's in place of anything else and then just simplify your gameplay and total calculations elsewhere. If you can afford to use smaller numbers though, try this and see how it works.

But that's not all with Calculation Speeds. As stated before, the fastest way to make a series of same-type variables faster is to put them all next to each other. And the way you put them all next to each other is by putting them into an array. So if you have a series of Bytes that are all separate variables, just combine them into 1 variable. If you have a series of variables that are all Bools, combine them into a Bool array. So on and so forth you get the idea. But of course, that's not where the fun stops. No, you see, you also have 5 more variable types that are essentially multiple floats put together. You have a Vector, a 2D Vector, a Rotator, a 4D Vector, AND a Transformation. This actually leads into a separate discussion. One about Structures.

A structure differs from an Array in that it is basically a variable type that is comprised of several unique variables that may do different things. A Vector is a structure that is comprised of 3 floats, a Rotator is also a Structure comprised of 3 Floats but it comes with additional code for the calculations to be Rotational Quantities rather than Scalar Quantities. A 2D Vector is a Structure comprised of 2 float values, a 4D Vector is a structure comprised of 4 float values, and a Transform is a Structure comprised of 2 Vectors and a Rotator. You may have more complex structures you may end up working with but you get the point.

The major difference between an Array and a Structure is that when you reference an array you can reference individual values as necessary. Whereas with a structure, you have to access every value within it simultaneously, even if you're only using one particular value. In theory an array and structure both comprised exclusively of floats will take up the same amount of processing power but the difference is that the only part of an array that's processed at once is the individual spot that's being used where when you use a structure the entire structure is being used. This means that while the data size might be the same in theory the calculation speed on the array is faster in practice because each individual value can be accessed independently. This is fine enough if you need every single value all at once or you're using so much of the struct that using it all at once is faster than using each individual part. However, if you only need one value currently, you'll generally prefer an array.

Which is better? Depends on context. If you need every unique value in the variable all at once, go with a struct. If you need only a singular value from it at any given time, use the array. And if you don't know which you'll need but you need most of those values more consistently than you'll only need one, then use the struct just to be safe. However, if you think you can get away with using an array because you're only using small portions of the array at once, use the array to consolidate.

I ended up having a lot more to say than I thought I would. So I'm going to split this into pieces. The next one on this subject will be about GPU programming and Graphics Optimization. So long.

No comments:

Post a Comment