javaregexservlet-filterstuckey-urlrewrite-filter

Removing Hashtag using Java WebFilter


I have the following configuration in the urlrewrite.xml:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN" "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
<urlrewrite use-query-string="true">
    <rule>
        <from>^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&amp;]{0,}+)(#{0,1})([a-zA-Z0-9-_=&amp;]{0,}+)$</from>
        <to type="redirect" last="true">/events$4$5</to>
    </rule>                 
</urlrewrite>

The regex ^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&amp;]{0,}+)(#{0,1})([a-zA-Z0-9-_=&amp;]{0,}+)$ has 7 groups, which are:

  1. (/event/showEventList): matches /event/showEventList
  2. (\.{1}): matches a single dot (.)
  3. (\bhtm\b|\bhtml\b): matches only htm or html
  4. (\?{0,1}): matches question mark (?) which can may occur zero or one
  5. ([a-zA-Z0-9-_=&amp;]{0,}+): matches the query string which can occur zero or more
  6. (#{0,1}): matches hashtag (#) which can may occur zero or one
  7. ([a-zA-Z0-9-_=&amp;]{0,}+): matches the fragment which can occur zero or more

If I test this configuration with a test URL: /event/showEventList.html?pageNumber=1#key=val, I am expecting that the redirected URL would be /events?pageNumber=1, but I am getting /events?pageNumber=1#key=val

I have a code snippet to test it, which is:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class UrlRewriterRegexTest {

    public static void main(String[] args) {
        String input = "/event/showEventList.html?pageNumber=1#key=val";
        String regex = "^(/event/showEventList)(\\.{1})(\\bhtm\\b|\\bhtml\\b)(\\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);   
        System.out.println(matcher.replaceFirst("/events$4$5"));
    }
}

It outputs to: /events?pageNumber=1.

Any pointer would be very helpful.


Solution

  • I am answering my own question, so that in future if someone else stumbles upon the same problem, this answer could help him.

    There is nothing to do with the UrlRewriteFilter framework. By enabling the debug log for this framework I have seen that the URL it is receiving before applying the defined rules doesn't have the URL Hash(#). From other SO answers and by analyzing the network traffic of the browser, I saw that the browser does not send the URL fragment to the server so it's not available in the HttpServletRequest. This is the reason the Regular Expressions are not working.

    Since this hash is available in the client browser and thanks to HTML5 History API I am able to solve the problem using JavaScript:

    <script type="text/javascript">
        window.addEventListener('DOMContentLoaded', (event) => {
            const url = new URL(window.location);
            url.hash = '';
            history.replaceState(null, document.title, url);
        });
    </script>