Have you ever wanted a nice system for being able to reference assets by name but you don't want to pay the price of comparing strings or constantly hashing (crc-ing) strings at runtime?

Now you can have best of both worlds. I've looked into various forms of hashing and making use of different techniques to find the quickest way to hash strings. In this article I'll explain what strategies are usually used and how you can achieve instant hashes.

Generally in game development there are two strategies to use for referencing object. You can either give them unique names or unique IDs. Both systems have pros and cons.

Names: Names are generally a good way to reference object because all programs can work with strings. It's very easy to change the name of something and modify the code to reflect that change. The problem with strings is generally the cost. In the final version of your game you generally don't care to have the actual strings around and you don't want to spend the time doing long string comparisons. The general way of fixing this is to Hash, or CRC, the string into a unique ID. This is done to save and time. You save space in the final data and in memory at runtime and it reduces the run time cost beacuse you only have to process a string once and do simple integer comparisons when searching for objects. Most engines that I've encountered can only Hash the strings found in code at runtime so you are still paying the cost for the string's data segment and you have to spend processing time doing the actual Hashing.

IDs: IDs are another common way to reference data. At some point an editor, big file generator assigned a unique value to an asset. The problems start when you want to reference that asset in code. Using the unique number directly works but will almost immediately create problems when artists and level designers start changing assets. The usual way around this is to get the asset baking program to generate a header file with names for the unique values as either enums or global defines. This is definitely a good solution, and really fast, but then you have to recompile the code every time assets change. In a large team this generally means recompiling the entire game.

IDs are generally great and depending on how they are used they can be extremely fast. I personally don't like the way that the IDs need to be referenced in code. This is why I started looking for ways to improve the string method. I wanted to find a way to use strings in code without having the pay the price of keeping the strings in data and having to Hash them at runtime. The three methods that I looked at that could make this possible are:
  • Inline Functions
  • Template Meta-programming
  • Macros

  • Surprisingly I can go through the list quite quickly:

    Inline Functions: I could sometimes get inline functions to work on some platforms. In windows it required __forceinline which is Microsoft specific. On gcc I could get it to work but it required some very odd optimization options. Since it was pretty hard to get it to work cross platform I decided that it just wouldn't be feasible with inline functions. This actually makes sense though, since with Inline Functions that means it has to be a recursive functions. I don't know any compiler that can handle 10 layers of inline function calls let alone 20 or 64. Also setting the compiler to allow for that many inline function calls can really slow down compilation and bloat your code. Inline Functions are out.

    Template Meta-programming: Basically there's one critical failure that stops templates from working: You cannot pass a pointer as a template argument. And thinking about it further, even if you did have a fixes character array I'm pretty sure that the resulting template code would be so horrendous you wouldn't want to use it anyway Any other method would just turn into an inline function. Templates are out.

    Macros: The term Macro is starting to become taboo in our industry. Lots of people complain that they are unsafe, lead to bad programming practices, etc, etc. I personally believe that no tool is inherently bad; It's the person that uses the tool that matters. Macros have their place, and so do gotos (just had to get that out there). Using Macros it actually became easy, although a bit repetitive, to be able to achieve compile time Hashes.

    Here's the magic: I use Macros to generate a fully unrolled Hash loop. That's it. By essentially taking a recursive hashing function and unrolling it to a maximum size you can achieve string Hashing that is performed at compile time. It's basically like writing the following code:
    iHash = 'T' + ('e' + ('s' + ('t') << 5) << 5) << 5;>

    Now before you go off implementing your uber hashing macro there are a few major gotchyas that need to be addressed:

    String length: Since this is a macro there's no variability. You need to have a maximum string length for the macro. This value will directly affect how long it takes to process the code generated by the macro. If you set it to 256 and have a lot of strings you may see an increase in compile time. I've found 64 to be a good value. This value is also related to the complexity of your algorithm. A complex algorithm can easily break the preprocessor limiting your string length.

    Exponential Recursion: That should send a shutter down your spine. There is one critical flaw with this system. Since it's recursive and there are no temporary variables used you absolutely cannot reference the 'next' character in your hashing function more than once. Why the 'next' character. One odd detail is that the macro version ends up processing the string backwards. So the 'next' character in the string ends up being the current Hash state. In my small test app, with a 64 character maximum string length, referencing the next character twice made my compile time leap from seconds to nearly a minute. Imaging taking a minute for each string Hash. This is only going to work as long as you use a fairly simple hashing function. Now saying that, it is possible for a simple function to be more than enough. My simply shift left 5 does a pretty good job for instance.

     I guess it's about time I got down to the implementation details. Here are the guts of the macro:
     Using the following simple application: 

     Generates the following assembly:
     Well it's still referencing the string in places, but the TEXT segment is empty.  The big win is that the HASH macro simply turns into simple value.  Using this method we can reference strings in critical main loops and not have to worry about the performance issues... since there are none.

    For a Perl script to help generate the Macro's header file see GenConstHash.pl.

    It should be pretty straightforward to use this in any system. Care would have to be taken to ensure that any hash values generated from Asset Bakers use the same algorithm. I just made a function that internally uses the same macro. It would end up being about the same cost as a loop anyway.


    Update: Edited the text a little. Updated perl script with the latest algoritm. For more information about the performance of the Compile Time Hash see Compile Time hashing 2.0.