c++cc-stringsstring-lengthstrlen

Why is strlen() about 20 times faster than manually looping to check for null-terminated character?


The original question was badly received and got many downvotes. So I thought I'd revise the question to make it easier to read and hopefully to be of more help to anyone seeing it. The original question was why strlen() was 20 times faster than manually looping through the string and finding the '\0' character. I thought this question was well founded, as everywhere I'd read strlen()'s technique to find the string length is essentially looping until it finds a null-terminating character '\0'. This is a common criticism of C strings for more reasons than one. Well as many people pointed out, functions that are part of the C library are created by smart programmers to maximise performance.

Thanks to ilen2, who linked me to a VERY clever way of using bitwise operators to check 8 bytes at once, I managed to get something that, on a string larger than about say 8 to 15 characters runs faster than strlen(), and many many times faster than strlen() when the string is considerably larger. For example, and strangely, strlen() seems to be linearly time dependent on the length of the string to finish. On the other hand, the custom one takes pretty much the same amount of time no matter the string length (I tested up to a couple of hundred). Anyway, my results are rather surprising, I did them with optimisation turned OFF, and I don't know how valid they are. Thanks a lot to ilen2 for the link, and John Zwinck. Interestingly, John Zwinck suggested SIMD as a possibility for why strlen() might be faster, but I don't know anything about that.


Solution

  • strlen() is a very heavily hit function and you can bet that several very bright people have spent days and months optimizing it. Once you get your algorithm right, the next thing is, can you check multiple bytes at once? The answer of course is that you can, using SIMD (SSE) or other tricks. If your processor can operate on 128 bits at a time, that's 16 characters per clock instead of 1.