Google Cloud Platform lets you create label logs using the RE2 regex engine.
How can I create a regex that matches the path in the URL?
Examples matches:
https://example.com/awesome --> "awesome"
https://example.com/awesome/path --> "awesome/path"
https://example.com/awesome/path/ --> "awesome/path"
https://example.com/awesome/path?arg1=123 --> "awesome/path"
Details:
https://example.com
here./
in between./
should NOT be matched.?arg1=123&arg2=456
should NOT be matched.a-zA-Z0-9
, dashes -
and underscores _
.Note that Google RE2 is different than PCRE2.
So the syntax isn't 100% clear what is supported and what isn't. Assuming (NOT SUPPORTED) VIM means it is supported but not on vim, I'd start with a negative look behind for the beginning of the url that you don't care about
(?<=https:\/\/example\.com\/)
Then you want alphanumeric characters [\w\-]+
followed by non trailing /
so I'd add a lookahead to verify that there are alphanumeric characters after the /
with (?=\/\w+)\/
The complete regex
(?<=https:\/\/example\.com\/)([\w\-]+((?=\/\w+)\/|\b))+