Ok they don't suck, but I have managed to make some improvements on them. I'm currently in the process of rebuilding my personal base library. I'm trying to do it 'right' this time by adding unit tests and making everything simple and readable above all else. I've gotten to the point where I'm writing a String class. In the past I used the C++ style '<<' operators but I find them cumbersome for generating proper formatting. There is a lot of flexibility in the good old printf function (i.e. %10.10s). In this article I aim to show the (minor) deficiencies of the current Printf functions and describe better methods.

The obvious solution to adding printf to a string class is to add a printf style function that sets up the variadic arguments, calls vsnprintf on a local buffer, and then concatenates that to the current string. The problem I see with this is that A) You are limited to the size of your temporary buffer and B) You must make an extra copy to move the string into your string class.

And so began my adventures into printf land.

My first step was to see how it's currently being done. I checked out the Microsoft standard library source to get an idea of the process. It's basically exactly as I had imagined: Check the array, write the string. Nothing overly extraordinary. It was while I was looking into some of the functions called internally that I noticed itoa.

First Tangent
The itoa function is something that I'm sure every c/c++ programmer has used many times without any thought about what's it's doing. The approach used in the Microsoft source reads exactly like every approach found on the internet. Straight forward and slow. Everyone seems to use the same process:
    1. Iterate the value getting the current character using '%' and incrementing to the next value using '/'.
    2. When finished reverse the string since it was written backwards.
Both parts of that function are horribly inefficient. The first step misses obvious optimizations like using '&' and '>>' for power of two bases. And the second string flip can completely avoided pretty easily.

Here's a link to my version of Itoa. I made two major changes. First I specialized routines for bases of two and I've specialized base ten so that the compiler can add an optimized integer trick (More details). Secondly I made sure to write my string backwards so it's only written once. I return a pointer to the start of the string. With these two optimization, my Itoa function runs about 20% faster than Microsoft's implementation. (In debug it's only 3% faster).

Second Tangent Since itoa turned out to be a simple win I was curious how floats were handled. I found out that there's no standard ftoa function. Basically the consensus on the internet is to use sprintf to write your float value to a string. Deep in the Microsoft's platform code I did find a reference to a conversion function, but it takes very specific values that are not generally exposed, and it's likely a very platform specific implementation.

At this point I went to Google's Code Search in the hunt for a simple open source implementation of ftoa. It took a while to find a clean one, and I don't even remember where I got it from, but it was relatively clean and understandable so I went with it..

The first step was to make sure it was accurate. There were a bunch of standards that the function did not follow so it took quite a bit of wrangling to get make it accurate. But even after adding a lot of branches to the code it still ended up being quite a bit faster than the Microsoft implementation. I don't have direct figures to test against since there is no official function to test against, but as you will see later the float part of the printf implementation has the best improvements.

Back to the topic at hand
So I tried my hand at printf. My first version tried to do a single pass over the string and write everything sequentially. This ended up being really slow since I had no idea how much space I would need, I was doing a lot of allocations. For my second try I did a prepass on the string and gathered all the information first. With this I was able to estimate the required length of the string and do a single allocation. Of course this will over estimate the required string length for any printed number, but I'm not very worried about that right now.

It took a while to get all the standards working and I'm currently very close. I support everything in the Microsoft standard except types 'n', 'a', and 'A'. Other than that the only issues I have are the exact format of NaN and INF and a few rounding differences with floating point value. Overall I'm pretty happy with the result. I was even more happy when I added timing:

Test  PrintF     sprintf
c     40940      43006
C     40809      49026
s     46911      46821
S     47440      60515
d     75029      73433
i     74859      72359
o     62811      76705
u     74409      72628
x     63525      75238
X     64545      76621
p     57695      69832
P     65410      73262
e     101759     156806
E     103602     157085
f     96503      154885
g     108164     157641
G     108170     159033
Total 1232590    1574905

These are raw clock cycles divided by 1000. All these tests are with pre-allocated strings (since sprintf doesn't allocate). There is a total about around 850000 tests performed. In total there are 31 differences between MyPrintF and sprintf (2 with E format and the rest with g and G).

The Total average savings is just under 20%. Not all types are faster. Microsoft's implementation for types d and i seem to be a bit more optimized than other integer types. For wchar conversions I just did a simple cast so the savings may not be real. The big thing to notice is the float types. Where most of the other types are 0-10% savings the floats are around 30%. The one thing I can think of that makes a huge difference would be the locale. I skipped using that since I've yet to have a reason to use it. Generally localisation happens in a seperate module, and in a very small part of the code so making every printf slower because of it sounds a bit silly to me.

One final thing that I didn't notice until the very end: I never optimized out my type safety. The timings provided are with the template functions and type safety functions left in. This also means that MyPrintf is also much safer than a regular printf. It would take a really dedicated programmer to break it.

Well I guess my closing arguments would be: Don't take things for granted. The standard libraries are there to conform to the standards, not to be optimized for you. At the same time, don't rewrite it all. And don't let someone tell you that safe code is always slow code. I've done this as an academic exercise and would never do it on a project with a deadline. Although now that I have done it I plan to use it in the future.