Share via


Overlong UTF-8 Escapes Bite

Every once in a while a security bug pops up that really piques my interest, and a new directory traversal bug that affects Apache Tomcat (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-2938) most certainly made me take notice because I haven't seen this bug type in a lllooonnnggg time.

It caught my eye because of these six little characters:

%c0%ae

Many people think these characters represent a 16-bit Unicode character. Wrong. They are an invalid sequence of characters that represent the ‘.' (%2e) character, it's often called an "overlong UTF-8 escape". You may be wondering why I know this little piece of trivia about UTF-8; IIS4 and IIS5 were bitten by the same class of bug eight years ago, and was an attack vector for the Nimda worm. The bulletin that fixed the bug is MS00-078.

Thumbing to page 379 of Writing Secure Code 2nd Edition, I am reminded that the canonical form of a UTF-8 character is the smallest number of bits that can represent that character. Remember, UTF-8 can encode characters wider than 8 bits. Without going into all the involved bit-manipulation, the correct form for a ‘.' character is a one-byte escape: %2e, not a two-byte escape: %c0%ae.

RFC 3629 states that "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."

UrlScan for IIS6, and IIS7's Request Filtering detect and reject non-canonical UTF-8 URLs by default.

A patch for Apache Tomcat is available at https://tomcat.apache.org/security.html.

Comments

  • Anonymous
    August 22, 2008
    PingBack from http://net.blogfeedsworld.com/?p=45327

  • Anonymous
    August 23, 2008
    Regarding long time - there are canonicalization bugs in Windows still existing, so it is no solely TomCat problem

  • Anonymous
    August 25, 2008
    Is the problem in Apache, or is it in the 3rd party i18n library they used for translation ? (rolling your own is never a good idea, but take in a 3rd party lib and you assume all their bugs/errors.. Something Larry Osterman talked a good deal about in his threat-modeling set of posts)

  • Anonymous
    August 25, 2008
    Ted, I mean this particular bug type, not canonicalization generaically

  • Anonymous
    August 25, 2008
    Nathan, I don't know where the issue is - I doubt it's httpd though.

  • Anonymous
    September 03, 2008
    A { COLOR: #0033cc } A:link { COLOR: #0033cc } A.local:visited { COLOR: #0033cc } A:visited { COLOR: