The obvious solution to adding printf to a string class is to add a printf style function that sets up the variadic arguments, calls vsnprintf on a local buffer, and then concatenates that to the current string. The problem I see with this is that A) You are limited to the size of your temporary buffer and B) You must make an extra copy to move the string into your string class.
And so began my adventures into printf land.
My first step was to see how it's currently being done. I checked out the Microsoft standard library source to get an idea of the process. It's basically exactly as I had imagined: Check the array, write the string. Nothing overly extraordinary. It was while I was looking into some of the functions called internally that I noticed itoa.
The itoa function is something that I'm sure every c/c++ programmer has used many times without any thought about what's it's doing. The approach used in the Microsoft source reads exactly like every approach found on the internet. Straight forward and slow. Everyone seems to use the same process:
- 1. Iterate the value getting the current character using '%' and incrementing to the next value using '/'.
- 2. When finished reverse the string since it was written backwards.
Here's a link to my version of Itoa. I made two major changes. First I specialized routines for bases of two and I've specialized base ten so that the compiler can add an optimized integer trick (More details). Secondly I made sure to write my string backwards so it's only written once. I return a pointer to the start of the string. With these two optimization, my Itoa function runs about 20% faster than Microsoft's implementation. (In debug it's only 3% faster).
At this point I went to Google's Code Search in the hunt for a simple open source implementation of ftoa. It took a while to find a clean one, and I don't even remember where I got it from, but it was relatively clean and understandable so I went with it..
The first step was to make sure it was accurate. There were a bunch of standards that the function did not follow so it took quite a bit of wrangling to get make it accurate. But even after adding a lot of branches to the code it still ended up being quite a bit faster than the Microsoft implementation. I don't have direct figures to test against since there is no official function to test against, but as you will see later the float part of the printf implementation has the best improvements.
So I tried my hand at printf. My first version tried to do a single pass over the string and write everything sequentially. This ended up being really slow since I had no idea how much space I would need, I was doing a lot of allocations. For my second try I did a prepass on the string and gathered all the information first. With this I was able to estimate the required length of the string and do a single allocation. Of course this will over estimate the required string length for any printed number, but I'm not very worried about that right now.
It took a while to get all the standards working and I'm currently very close. I support everything in the Microsoft standard except types 'n', 'a', and 'A'. Other than that the only issues I have are the exact format of NaN and INF and a few rounding differences with floating point value. Overall I'm pretty happy with the result. I was even more happy when I added timing:
Test PrintF sprintf c 40940 43006 C 40809 49026 s 46911 46821 S 47440 60515 d 75029 73433 i 74859 72359 o 62811 76705 u 74409 72628 x 63525 75238 X 64545 76621 p 57695 69832 P 65410 73262 e 101759 156806 E 103602 157085 f 96503 154885 g 108164 157641 G 108170 159033 Total 1232590 1574905
These are raw clock cycles divided by 1000. All these tests are with pre-allocated strings (since sprintf doesn't allocate). There is a total about around 850000 tests performed. In total there are 31 differences between MyPrintF and sprintf (2 with E format and the rest with g and G).
The Total average savings is just under 20%. Not all types are faster. Microsoft's implementation for types d and i seem to be a bit more optimized than other integer types. For wchar conversions I just did a simple cast so the savings may not be real. The big thing to notice is the float types. Where most of the other types are 0-10% savings the floats are around 30%. The one thing I can think of that makes a huge difference would be the locale. I skipped using that since I've yet to have a reason to use it. Generally localisation happens in a seperate module, and in a very small part of the code so making every printf slower because of it sounds a bit silly to me.
One final thing that I didn't notice until the very end: I never optimized out my type safety. The timings provided are with the template functions and type safety functions left in. This also means that MyPrintf is also much safer than a regular printf. It would take a really dedicated programmer to break it.
Well I guess my closing arguments would be: Don't take things for granted. The standard libraries are there to conform to the standards, not to be optimized for you. At the same time, don't rewrite it all. And don't let someone tell you that safe code is always slow code. I've done this as an academic exercise and would never do it on a project with a deadline. Although now that I have done it I plan to use it in the future.