regexurl

Regex to match two parts of a URL


I am trying to create a RegEx pattern which will match two parts of a URL.

Example URL:

app.company.com/base-path?parameter1=stuff&parameter2=morestuff&parameter3=IMPORTANT%20THING

In this case I want the pattern to match in the case that there is a base path and the third parameter, so both: /base-path and all of parameter3=IMPORTANT%20THING


Solution

  • Here is my answer, and you can test that here

    /^.+?(\/.+?)\?.+?&(parameter3=.+)$/gm
    

    I do not know which language you use, this is PCRE2 version which is used for PHP 7.3+, but I think it is easy to migrate to other language.

    Security risk

    There are some risk when using regex, for that bad guys can construct malicious parameter1 or parameter2 to spoof regex and you will get unexpected result, especially AFTER DECODING URL.

    For example url

    app.company.com/base-path?parameter1=stuff&parameter2=%26parameter3%3Dmorestuff&parameter3=IMPORTANT%20THING
    

    Bad guys set parameter2=%26parameter3%3Dmorestuff, and after decoding, you will get this url

    app.company.com/base-path?parameter1=stuff&parameter2=&parameter3=morestuff&parameter3=IMPORTANT THING
    

    And what you get from regex is parameter3=morestuff&parameter3=IMPORTANT THING, which is unexpected.

    So, if you really want to use regex, DO NOT DECODE URL BEFORE MATCHING