[SOLVED] Extract URLS from an obfuscated JS file

Extract URLS from an obfuscated JS file

I'm trying to extract all the URLS mentioned in an obfuscated JS file. So far the script extracts only one URL. All the URLS are contained in one line due to obfuscation. Here is the piece of code that I'm using for URL extraction:

while( my $line = <$info>) {
    chomp ($line); #removing the unwanted new line character
    my ($uri)= $line =~ /$RE{URI}{HTTP}{-scheme=>'https?'}{-keep}/  ;
    $uri=~s/[,\']//g;
    print "$uri\n" if ($uri);
}

How can I improve on this piece of code so that it correctly extracts all the URLS? This piece of code works nicely with normal JS files.

Solution

Try this. The /g at the end of the regex allows it to jump from match to match in successive invocations, keeping track of its position in the string as it goes along. See "Global matching" in "perldoc perlretut", the Perl RegExpt Tutorial.

The parenthesis I added around ($re) capture the result of the match and assign it to $1. See "Extracting matches" also in "perldoc perlretut";

while( my $line = <DATA>) {
    chomp ($line); #removing the unwanted new line character
    my $re = $RE{URI}{HTTP}{-scheme=>'https?'}{-keep};
    while ( $line =~ /($re)/g ){
        my $uri = $1;
        $uri=~s/[,\']//g;
        print "$uri\n" if ($uri);
    }
}