Unicode and ISAPI Filters
Question:
How can one use GetHeader api of PHTTP_FILTER_PREPROC_HEADERS to retrieve fields, such as "url" and others, in unicode? I could not find any documentation on this topic.
Thanks a lot,
Answer:
It is not possible to retrieve a value in Unicode using GetHeader.
ISAPI Filter is an ANSI API - all values you can get/set using the API must be ANSI. Yes, I know this is shocking; after all, it is 2006 and everything nowadays are in Unicode... but remember that this API originated more than a decade ago when barely anything was 32bit, much less Unicode. Also, remember that the HTTP protocol which ISAPI directly manipulates is in ANSI and not Unicode.
Now, one should note that GetServerVariable() of ISAPI Filter on IIS6, like ISAPI Extension on IIS6, is able to retrieve server variable values in Unicode, using the UNICODE_ prefix in front of server variable names. Prior versions of IIS do not support nor retrieve any values in Unicode in ISAPI.
However, this does NOT apply to request headers retrieved via the HEADER_ or HTTP_ prefix (i.e. you cannot use UNICODE_HTTP_ACCEPT_ENCODING to retrieve Unicode value for the Accept-Encoding HTTP request header). It also does not make sense, either, because HTTP headers are not transported in Unicode. You might as well make the MultiByteToWideChar() call yourself if you want the value in Unicode.
The usual caveat with GetServerVariable() and ISAPI Filters is that not all server variables are valid in any given ISAPI Filter event (i.e. you cannot retrieve UNICODE_URL in SF_NOTIFY_PREPROC_HEADERS because it is not yet valid, but you can retrieve it in a later event like SF_NOTIFY_AUTHENTICATE).
In other words, for fields like URL that you would retrieve with GetHeader, it is simply not possible for you to retrieve them in Unicode during the SF_NOTIFY_PREPROC_HEADERS filter event.
//David
Comments
Anonymous
February 04, 2006
The filter API though doesn't get the URL decoding correct in all circumstances when it is UTF8 encoded. Installing an extension to handle 404 errors and then the URLs are correct.
For example, the filter will get this URL correct:
http://www.kirit.com/Niccol%C3%B2%20Machiavelli/The%20Prince/Dedication
But a URL like this one will be correct in the 404 handler (which is why it manages to find the page), but the filter cannot find it as the URL was incorrectly decoded.
http://www.kirit.com/Categories:/Microsoft%20Windows%E2%84%A2
Is there a way to get the actual URL that was requested without it being decoded? Then it would also be able to use a custom decoding scheme (for example treating underscores as spaces to make URLs easier to read than having %20s in them).
The level the filter is running on at this site is SF_NOTIFY_AUTH_COMPLETE.Anonymous
February 04, 2006
Kirit - Filter API is ANSI. Decoded UTF8 character may/not be in the current server's code page, so behavior of UTF8 URLs (and corresponding Unicode Filesystem namespace) will always be hit-miss when get/set with the Filter API.
I am not certain what is correct/incorrect in your ISAPI Filter examples, 404 handlers, file not found, etc - Nor what API call you are making to retrieve the values, so I cannot comment further.
FYI: Custom mapping of underscore to spaces does not require URL manipulation.
This general problem space is addressed by GetServerVariable( "UNICODE_URL" ) as well as HSE_REQ_EXEC_UNICODE_URL in ISAPI Extension.
//DavidAnonymous
February 04, 2006
David, thanks for your quick reply (and on a Sunday to boot).
I'm using GetServerVariable with UNICODE_URL. What I really want to be able to get a hold of is what the UA actually sends to the server. With that I can decode it myself (like I already do with the query encodings if I choose to do those in a way IIS doesn't understand).
For the underscores, I wouldn't need to get the URL as sent by the UA unless I wanted to encode the underscore as %5F which would be a pretty natural encoding. If I leave that to IIS I won't know which underscores are which.
There is a similar issue with the URL redirect mechanism IIS uses for a custom 404 handler. It builds a URL to pass in as a query string to the handler which is all very well, but it only provides the decoded URL (which it decodes to give a different file specification than is available in the ISAPI filter). So, for example, a question mark in the URL encoded as %3F is then indistinguishable from one that starts the query string (although the query string is left encoded so that can be reliably decoded), see http://www.kirit.com/Errors%20in%20IIS's%20custom%20404%20error%20handling.
There is yet another example of an encoding/decoding problem in the Response.Redirect() method too. It expects an un-encoded URL (or at least that the file specification part is not encoded) which in effect makes assumptions about the format of a valid URL. Even the W3C isn't immune to these problems (see http://www.kirit.com/W3C's%20CSS%20validation%20service).
So, is there no way to find what the UA actually sent to the server with the GET or POST (or whatever)? If so this is a real shame and a seemingly pointless oversight.Anonymous
February 04, 2006
On re-reading that I realise that I'm not making an awful lot of sense.
The basic thing is that if the filer API is going to decode the URL given by the browser based on its own thoughts on which code page to use then the obvious answer is to allow access to the string given by the browser in an un-modified form.
I can't show what the filter gives for the second example URL on the original comment because to do that will break the site. The filter correctly decodes the sequence %C3%B2 to a "ò", but in the second example the sequence %E2%84%A2 is decoded in such a way that it isn't possible to interpret. If the 8 bit string is treated as UTF8 then the first URL is broken because the "ò" is not valid at that location in a UTF8 string. I can't remember if the second URL as exposed to the filter could be decoded as a UTF8 sequence, I don't think so.
The reason why both pages work on the site is that the first is served by the ISAPI filter and the second is served by an ISAPI extension that is the custom handler for 404 errors. You can see this because the filter knows where the query starts so you can append a query string to the first URL and it will still serve the page. With the second if you append a query then the extension can't work out if the question mark is part of the path specification in the URL or not.
Between the two of them this allows most pages to work reasonably well (although the problems with the way that the 404 URL is passed to the extension makes it hard to decode any query strings that may be used).
What is most strange is that the way that the path specification in the URL is decoded by IIS when requesting UNICODE_URL is different to the way that the URL is decoded when IIS builds a URL to pass to the 404 handler (which is then fetched using GetServerVariable on the EXTENSION_CONTROL_BLOCK using UNICODE_QUERY_STRING).
I spent quite some time trying to work out what encoding the filter was assuming (I tried all the mbcs system calls etc), but I couldn't find how it was doing it at all.Anonymous
February 05, 2006
The comment has been removedAnonymous
February 05, 2006
The comment has been removedAnonymous
February 13, 2006
I am using GetHeader and SetHeader functions in
"SF_NOTIFY_AUTH_COMPLETE" .
The signature of the function is:
BOOL (WINAPI * SetHeader) (
struct _HTTP_FILTER_CONTEXT * pfc,
LPSTR lpszName,
LPSTR lpszValue
);
As mentioned in MSDN LPSTR is a "Pointer to a null-terminated string of 8-bit Windows (ANSI) characters."
So my question is that is there a function which I can use to manipulate the headers using wide char functions?
I am not aware of how certificates work but I assume that they provide data which could be in UNICODE and then I could be in trouble.
Can you someone help me sort this out?
Thanks in advance,Anonymous
February 13, 2006
Suyog - ISAPI Filter API provides no mechanism to manipulate anything using WideChar. Neither does ISAPI Extension API except for very specific functions and even then, IIS only accepts URL and FilePath in Unicode. Remember, HTTP is not in WideChar.
If you need to transport Unicode values through a non-Unicode transport, then you should do what Browsers do - transport Unicode as %-encoded UTF8, which transports as ANSI but can be decoded back to Unicode.
//DavidAnonymous
February 23, 2006
I've been back through the code that I've been using. Although I wasn't able to recreate the problem (which is good I guess, but I would have prefered to understand why it was happening) I have written up everything that I did to track through it.
There's a full write up on fetching the Unicode values and how IIS expects path specifications to be encoded. I haven't found any official documentation on how IIS interprets this information or how it falls back to other encodings (or even which it uses when it does fall back). There are a number of examples in the article below and it looks like IIS (at least on my systems which may of course not be typical) prefers UTF-8 to any other encoding.
http://www.kirit.com/Getting%20the%20correct%20Unicode%20path%20within%20an%20ISAPI%20filter
Anyway, thanks for your attention.
KiritAnonymous
July 22, 2006
I am writing an ISAPI filter that is to be used to measure some performance metrics in IIS 5.1 and 6.0 (on a per request basis)
As part of correlating these metrics with other processes that process the requests , the response is expected to have a header which I am interested in retrieving in my filter.
I have used 2 functions so far and have not been successfull...
GetServerVariable
GetHeader
I have used them in the SF_NOTIFY_SEND_RESPONSE and SF_NOTIFY_LOG event handlers in my filter.
The return value is false and the GetLastError returns -1
Any help is appreciated..Anonymous
July 23, 2006
The comment has been removedAnonymous
September 23, 2006
Hi I don't know very well about ISAPI and I have a problem with ASP programming. When IIS executes 404, ??? appears instead of characters because IIS doesn't support Unicode. I want IIS to support Unicode and shows the proper character. As far as I understand, you refer to this problem. As I say, I am not very familiar with ISAPI.
I would appreciate very much if you could give me the full source code to accomplish this and how I install them. C++is installed to my computer and I can compile your sample codes. Thank you very much in advance.
Best regards
Sinan MERTAnonymous
September 25, 2007
Hi David, I wanted to know is it possible to write widecharacters in client through ISAPI. I am working on a application which was ANSI earlier. Now converting it to support unicode. English characters which are converted to wide character are getting displayed properly but other language characters are not. So i wanted to know whether it is possible to write characters of other languages using ISAPI? please provide any small ISAPI application or any web link that might be helpful. Thanks in advance, Uday udaymshanbhag@gmail.comAnonymous
December 05, 2007
Hi All, I have developed an ISAPI Filter which saves user name in the cookie in ANSI format but the same cookie is also used by my web application which uses UTF-8 format to save and retrieve values in that cookie. Actually I am developing a Swedish application which will run on Windows Server 2003 (Swedish version) and the user name contains Swedish charters (For example ‘Administratör’). The filter store this user name in the cookie in ANSI format and when my Web application retrieves that user name from that cookie, it reads it as ‘Administratr’ (without ö). Is there any way that my ISAPI Filter can store User name in UTF-8 format in the cookie or we can set the HTTP page header to use UTF-8 format (encoding)? Following is the code I have written to write cookie sprintf(szCookie, "Set-Cookie: UserID=%s;expires=%s; path=/;rn;",” Administratör”,CurrentDate); PHTTP_FILTER_CONTEXT->AddResponseHeaders(pfc, szCookie,0) Regards AsimAnonymous
December 31, 2007
Asim - I suggest you properly encode and encode your values such that they pass correctly transparently. HTTP 1.1 header is defined by RFC 2616 to use OCTETS (any 8-bit character), which means that while you can put any 8-bits, including the "o", into it, the recipient also interprets it as OCTET. There is no way to "set the HTTP page header to use UTF8 format" since it's defined already. In your case, you put in the "o" (character above 127) as ANSI, but the page interpreted it as UTF8, which is not valid for characters above 127. If you %-encode UTF8 versions within the ISAPI (and corresponding %-decode in the application), the character should transport correctly. //DavidAnonymous
January 27, 2008
The comment has been removedAnonymous
January 30, 2008
rob - %-encoded UTF8 value as URL works. ISAPI Filter API is ANSI, so you must use %-encoded UTF8 (and decode it yourself) to pass values around. That is the way to pass Unicode values through ANSI APIs. The encoding "in your pages" have no effect on the interpretation of URL or querystring. //David