regexperlssrf

How to extract the hostname from a URI that includes a username and password in Perl?


The sandbox web app I'm working with is a form that receives a LinkedIn URL and uses it to pull the profile photo of a LinkedIn profile. However, its current implementation has an SSRF vulnerability, as the regex we're using to check the link against doesn't account for the possibility of URLs that don't need to begin with a page's hostname (e.g. https://username:password@hostname.com). Here's the Perl code:

#!/usr/bin/perl

package External;

use warnings;

sub isValidLinkedinProfileUrl {
    $linkedin_url = $_[0];
    my $pattern = qr/^https:\/\/www\.linkedin\.com/;
    return $linkedin_url =~ $pattern;
}

1;

Let's say we give it a "malicious" URL: (https://www.linkedin.com:password@hacker-site.notld)

Our regex would pass it, because it begins with LinkedIn's hostname. How can we fix this URL to "make sure that it takes into account URLs that might start with https://www.linkedin.com but actually go to a different host."?

I've tried adjusting the regex to use a pipe, but don't really know how I should even be adjusting this regex to "fix" it. I assume I should only modify it to only allow LinkedIn hostnames? If I modify it to /^https://www.linkedin.com$/, then that won't work either. I'm not sure what else to try.


Solution

  • To match the URLs you're trying to match and none other:

    use URI qw( );
    
    sub is_linkedin_url {
       my $url = URI->new( shift );
    
       # Efficient way of handling `undef`
       # values returned for relative URLs.
       no warnings qw( uninitialized );
    
       return
          (  $url->scheme eq 'https'
          && lc( $url->host ) eq 'www.linkedin.com'
          && $url->port == 443
          );
    }
    

    To match the profile URLs and none other:

    use URI qw( );
    
    sub is_linkedin_profile_url {
       my $url = URI->new( shift );
    
       # Efficient way of handling `undef`
       # values returned for relative URLs.
       no warnings qw( uninitialized );
    
       return
          (  $url->scheme eq 'https'
          && lc( $url->host ) eq 'www.linkedin.com'
          && $url->port == 443
          && $url->path =~ m{^/in/[^/]+/\z}
          );
    }