Jaa


Localization Bugs: String length limitations, #2

Let's continue on the topic of string length limitations. I showed yesterday an example of how a string length limitation might lead to truncated text. That's not pretty, but it's not too bad either. At least nothing is broken. Only truncated text isn't the only thing that can happen when string length limitations come into play.

If you're a developer, it's probably no surprise that string length limitations are closely related to character buffers, and where there are buffers there might be buffer overflows. A lot has been written about buffer overflows, and I won't repeat it here. Instead I'll try to show how buffer overflows can be exposed by localization, or how safe coding practices can lead to bugs in localized software.

I have a bit of a problem right now though. I don't have any fancy screen shots to illustrate my points any more. Clippings and hotkeys - those I can create at will, so they're easy to get screen shots of. But what I'll be talking about now is a little bit harder to show - especially since Windows is quite resilient to long translations these days. No matter, I'll just talk anyway.

Often times, truncated text appears when the developer is trying to do the right thing - the code probably uses strncpy to copy my translation to some buffer and at the same time chopping it off at a "safe" length. But what happens to the string after this? Well, as we saw yesterday, the text might simply be displayed on screen. Safe enough.

More interesting things could happen though. What if there's a string dependency such that one translation in one file needs to be consistent with some other translation in some other file. And what if the translations are indeed consistent in these files, but happen to be longer than what the developer anticipated? And what then if the full string is compared with a truncated version of the same? In this scenario, the string dependency is broken, with unknown results. This might sound farfetched, but things like this have happened.

One can picture other things going awry as well when strings are truncated. Imagine if two strings are concatenated and then feed to _snprintf. Imagine if the first string of those strings contains the placeholder %d and the second string contains the placeholder %s. Now imagine that the first string had a long translation where the %d is towards the end of the string and that before the strings are concatenated, the first string is truncated so that the placeholder is lost. When you feed the resulting string to _snprintf, the dead will walk the earth again. Or, more likely, you'll get an AV. (By the way, placeholders deserve their own posts - I'll get to that next year.)

There's more to say about these things, but I'll cut it short here. Next time, I'll ramble on a little bit more but after that I'll get to something more constructive - how I can troubleshoot the Case of the Crashing Application and what we do to expose these problems on a larger scale.


This posting is provided "AS IS" with no warranties, and confers no rights.

Comments

  • Anonymous
    December 16, 2004
    String length limitation issues bring back memories of a tricky bug that I fixed a couple of years ago.

    Initially it seemed localization specific. It only happened on some Japanese machines . After some research I narrowed it down to machines using the Japanese Imperial calendar.

    The code assumed that 20 bytes should be enough to hold a string for a "short date". This specific short date format has 2 characters for the era (or Emperor), the year, the character for year, the month, the character for month, the day of the month and the character for day. When all the numeric values had 2 digits it was 11 double-byte characters +1 because it was a zero-terminated MBCS string = 23 bytes.

    As usual with overflows, display was only part of the issue. It caused other problems (actually these other problems were detected before we realized that the display portion wasn't right)

    Even though initially it was noticed on some localized systems it could be reproduced on English systems as well (you can actually make "short" dates quite "long")

    After fixing this bug, I started playing around with custom strings for the short date in my main machine. Our software was fine but a lot of other software didn't quite like it and after a couple of weeks I had to return my machine back to "normal"
  • Anonymous
    December 16, 2004
    That's a nice one! I love these bugs - it's so much fun taking a crashing application and tracking down the actual cause. Especially when you end up proving that it's not localization's fault after all :)