It seems that IIS incorrectly delivers the request URL to a web application if the URL contains UTF-8 encoded characters, which are not supported by the current system locale. All "unsupported" characters are replaced by question marks ('?').
Example: The system locale is set to Norwegian. The following URL works fine:
/myapp/Blåbærsyltetøy/
The following URL does not work:
/myapp/черничный-джем/
In both URLs, non-ASCII characters are encoded as UTF-8 and then percent-encoded, so the actual URLs look like this:
/myapp/Bl%C3%A5b%C3%A6rsyltet%C3%B8y/
/myapp/%D1%87%D0%B5%D1%80%D0%BD%D0%B8%D1%87%D0%BD%D1%8B%D0%B9-%D0%B4%D0%B6%D0%B5%D0%BC/
The application uses two ways of handling requests:
Both are suffering from the same problem, and both have no problem if the URL only contains characters that are supported by the system locale.
In the case of ISAPI, it looks like EXTENSION_CONTROL_BLOCK::lpszPathInfo
already delivers a percent-decoded URL, where all "unsupported" characters have been replaced by question marks. The EXTENSION_CONTROL_BLOCK::lpszPathInfo
attribute is a multi-byte character string, and there is no wide-character string version of this structure.
Is there a way to get the original, percent-encoded URL or prevent IIS from decoding URLs to work around the problem?
Solution for ISAPI
Get the request URL from the server variable HTTP_URL
rather than PATH_INFO
. This delivers the original, percent-encoded URL, which can then be decoded correctly (by percent-decoding to an array of bytes and interpreting that array of bytes as an UTF-8-encoded string).
This variable contains the query string and the original path before URL rewriting, which may be unwanted, so it may need some extra processing.
Also, for error handler requests, this variable contains a string in a format similar to
<DLL_PATH>?<STATUS_CODE>;<ORIGINAL_HTTP_URL>
which needs to be parsed. But it contains all the information that PATH_INFO
contains, except without incorrect decoding.
Note: Getting Path_INFO
using GetServerVariable
, rather than from the EXTENSION_CONTROL_BLOCK
structure does not solve the encoding problem.
Solution for wfastcgi
Server variables are encoded using the system locale (called 'mbcs'
in Python) by default. This behavior can be changed by setting a registry key:
reg add HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\w3svc\Parameters /v FastCGIUtf8ServerVariables /t REG_MULTI_SZ /d REQUEST_URI\0PATH_INFO
Note that this will affect all wfastcgi applications on the same server and may break existing applications which do not expect variables to be UTF-8-encoded (rather unlikely, as any sane application that uses non-ASCII URLs would use UTF-8 encoding...).