mediawikiurlencode

IIS 10 + MediaWiki: Wrong encoding for short URLs


I am running MediaWiki on Windows Server 2016 (language set to Korean) in a local network. I have followed the MediaWiki manual for enabling short URLs, and the result was below:

A. 대문¹ becomes ´ë¹® and inaccessible via its short URL.

B. 媛곴컖 becomes 각각.² A link to 媛곴컖 points to http://192.168.0.123/wiki/媛곴컖 (or http://192.168.0.123/wiki/%E5%AA%9B%EA%B3%B4%EC%BB%96) and shows the content of page 각각 when clicked.

In both cases, their full URLs (like http://192.168.0.123/w/index.php?title=%EB%8C%80%EB%AC%B8) work 99%³ perfectly.

With some investigations, I found there are some encoding issues.

(A)
EB 8C 80 EB AC B8
↓ Decoded with UTF-8
대문
↓ Encoded with EUC-KR
B4 EB B9 AE
↓ Decoded with Latin-1
´ë¹®

(B)
E5 AA 9B EA B3 B4 EC BB 96
↓ Decoded with UTF-8
媛곴컖
↓ Encoded with EUC-KR
EA B0 81 EA B0 81
↓ Decoded with UTF-8
각각

It seems that EUC-KR is involved in some point of the URL encoding/decoding process, but I could not find the certain setting anywhere in IIS or MediaWiki settings.


¹ Korean word for "Main Page"

² 媛곴컖 is mojibake of 각각, which means "each" in Korean. But the meaning itself is not meaningful in this context.

³ The remaining 1% is of VisualEditor replacing category links to their full URLs, e.g. Category:Foo to Category:index.php?title=Foo.


Solution

  • After several months of wandering, I found a workaround:

    1. As shown in https://www.mediawiki.org/wiki/Topic:Rwbeswv4deqbzlvn, add following line in LocalSettings.php:
    $_SERVER['REQUEST_URI'] = urlencode($_SERVER['REQUEST_URI']);
    
    1. In the rewrite rule in web.config, replace
    <action type="Rewrite" url="w/index.php?title={R:1}" logRewrittenUrl="true" />
    

    with

    <action type="Rewrite" url="w/index.php?title={UrlDecode:{R:1}}" logRewrittenUrl="true" />
    

    Now everything works fine.

    The full source of working web.config is as below:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <system.web>
            <globalization fileEncoding="utf-8" requestEncoding="utf-8" responseEncoding="utf-8" />
        </system.web>
        <system.webServer>
            <handlers>
                <remove name="PHP53_via_FastCGI" />
                <remove name="PHP_via_FastCGI" />
                <add name="PHP_via_FastCGI" path="*.php" verb="GET,HEAD,POST" modules="FastCgiModule" scriptProcessor="C:\php\php-cgi.exe" resourceType="Either" requireAccess="Script" />
            </handlers>
            <rewrite>
                <outboundRules>
                    <preConditions>
                        <preCondition name="ResponseIsHtml1">
                            <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />
                        </preCondition>
                    </preConditions>
                </outboundRules>
                <rules useOriginalURLEncoding="false">
                    <rule name="RewriteUserFriendlyURL1" enabled="true" stopProcessing="true">
                        <match url="^wiki/([^/]+/?[^/]*/?[^/]*)/?$" />
                        <conditions>
                            <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
                            <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
                        </conditions>
                        <action type="Rewrite" url="w/index.php?title={UrlDecode:{R:1}}" logRewrittenUrl="true" />
                    </rule>
                </rules>
            </rewrite>
        </system.webServer>
    </configuration>