phphtmloutputurlencodehtmlspecialchars

How To Output User Submitted Links On Your Webpage Securely?


I want to allow my website visitors (any Tom, Dick & Harry) submit their links to my webpage for output on my page. I need to parse user submitted urls before echoing their submitted urls on my page. Need to parse the urls as I won't know what urls they will be submitting nor the structures of their urls.

A user could theoretically visit my page and inject some Javascript code using, for example:

?search=<script>alert('hacked')</script>

You understand my point.

I got to write php script that when users submit their urls, then my php script parses their urls and encodes them by adding urlencode, rawurlencode, intval in the appropriate places before outputting them via htmlspecialchars. Another wrote this following script. Problem is, it outputs like so:

http%3A%2F%2Fexample.com%2Fcat%2Fsubcat?var_1=value+1&var2=2&this_other=thing&number_is=13

It should output like this:

http://example.com/cat/subcat?var_1=value+1&var2=2&this_other=thing&number_is=13

This is their code .... Third Party Code:

<?php
function encodedUrl($url){
    $query_strings_array = [];
    $query_string_parts  = [];
    // parse URL & get query
    $scheme        = parse_url($url, PHP_URL_SCHEME);
    $host          = parse_url($url, PHP_URL_HOST);
    $path          = parse_url($url, PHP_URL_PATH);
    $query_strings = parse_url($url, PHP_URL_QUERY);

    // parse query into array
    parse_str($query_strings, $query_strings_array);

    // separate keys & values
   $query_strings_keys   = array_keys($query_strings_array);
   $query_strings_values = array_values($query_strings_array);

   // loop query
  for($i = 0; $i < count($query_strings_array); $i++){
       $k   = urlencode($query_strings_keys[$i]);
       $v   = $query_strings_values[$i];
       $val = is_numeric($v) ? intval($v) : urlencode($v);
    
       $query_string_parts[] = "{$k}={$val}";
   }

   // re-assemble URL
   $encodedHostPath = rawurlencode("{$scheme}://{$host}{$path}");

   return $encodedHostPath . '?' . implode('&', $query_string_parts);
}

$url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=13';
$url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=50000';

// run urls thru function & echo
// run urls thru function & echo
echo $encoded_url1 = encodedUrl($url1); echo '<br>'; 
echo $encoded_url2 = encodedUrl($url2); echo '<br>'; 
?>

So, I changed this of their's:

$encodedHostPath = rawurlencode("{$scheme}://{$host}{$path}");

to this of mine (my amendment):

$encodedHostPath = rawurlencode("{$scheme}").'://'.rawurlencode("{$host}").$path;

And it seems to be working. As it's outputting:

http://example.com/cat/subcat?var_1=value+1&var2=2&this_other=thing&number_is=13

QUESTION 1: But I am not sure if I put the raw_urlencode() in the right places or not and so best you check. Also, should not the $path be inside raw_urlencode like so ?

raw_urlencode($path)

Note however that:

raw_urlencode($path)

doesn't output right.

QUESTION 2: I FURTHER updated their code to a new VERSION and it's not outputting right. Why is that ? Where am I going wrong ? All I did was add a few lines. This is my update (NEW VERSION) which outputs wrong. Outputs like this:

http%3A%2F%2Fexample.com%2Fcat%2Fsubcat?var_1=value+1&var2=2&this_other=thing&number_is=13

I added a few lines of my own at the bottom of their code.

MY UPDATE (NEW VERSION):

<?php
function encodedUrledited($url){
    $query_strings_array = [];
    $query_string_parts  = [];
    // parse URL & get query
    $scheme        = parse_url($url, PHP_URL_SCHEME);
    $host          = parse_url($url, PHP_URL_HOST);
    $path          = parse_url($url, PHP_URL_PATH);
    $query_strings = parse_url($url, PHP_URL_QUERY);

    // parse query into array
    parse_str($query_strings, $query_strings_array);

    // separate keys & values
   $query_strings_keys   = array_keys($query_strings_array);
   $query_strings_values = array_values($query_strings_array);

   // loop query
  for($i = 0; $i < count($query_strings_array); $i++){
       $k   = urlencode($query_strings_keys[$i]);
       $v   = $query_strings_values[$i];
       $val = is_numeric($v) ? intval($v) : urlencode($v);
    
       $query_string_parts[] = "{$k}={$val}";
   }

   // re-assemble URL
   $encodedHostPath = rawurlencode("{$scheme}").'://'.rawurlencode("{$host}").$path;
   
   return $encodedHostPath . '?' .implode('&', $query_string_parts);
}

if(!ISSET($_POST['url1']) && empty($_POST['url1']) && !ISSET($_POST['url2']) && empty($_POST['url2']))
{
    //Default Values for Substituting empty User Inputs.
    $url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=138';
    $url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=500008';
}
else
{
    //User has made following inputs...
    $url1 = $_POST['url1'];
    $url2 = $_POST['url2'];
    
    //Encode User's Url inputs. (Add rawurlencode(), urlencode() and intval() in user's submitted url where appropriate).
    $encoded_url1 = encodedUrledited($url1);
    $encoded_url2 = encodedUrledited($url2);
}

echo $link1 = '<a href=' .htmlspecialchars($encoded_url1) .'>' .htmlspecialchars($encoded_url1) .'</a>';
echo '<br/>';
echo $link2 = '<a href=' .htmlspecialchars($encoded_url2) .'>' .htmlspecialchars($encoded_url2) . '</a>';
echo '<br>';

?>

This thread is really about the 2nd code. My update.

Thank You!


Solution

  • I fixed my code. Answering my own question.

    Fixed Code:

    function encodedUrledited($url){
        $query_strings_array = [];
        $query_string_parts  = [];
        // parse URL & get query
        $scheme        = parse_url($url, PHP_URL_SCHEME);
        $host          = parse_url($url, PHP_URL_HOST);
        $path          = parse_url($url, PHP_URL_PATH);
        $query_strings = parse_url($url, PHP_URL_QUERY);
    
        // parse query into array
        parse_str($query_strings, $query_strings_array);
    
        // separate keys & values
       $query_strings_keys   = array_keys($query_strings_array);
       $query_strings_values = array_values($query_strings_array);
    
       // loop query
      for($i = 0; $i < count($query_strings_array); $i++){
           $k   = $query_strings_keys[$i];
           $key = is_numeric($k) ? intval($k) : urlencode($k);
           
           $v   = $query_strings_values[$i];
           $val = is_numeric($v) ? intval($v) : urlencode($v);
        
           $query_string_parts[] = "{$key}={$val}";
       }
    
       // re-assemble URL
       $encodedHostPath = rawurlencode($scheme).'://'.rawurlencode($host).$path;   
       $encodedHostPath .= '?' .implode('&', $query_string_parts);
       
       return $encodedHostPath;
    }
    
    if(!ISSET($_POST['url1']) && empty($_POST['url1']) && !ISSET($_POST['url2']) && empty($_POST['url2']))
    {
        //Default Values for Substituting empty User Inputs.
        $url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=138';
        $url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=500008';
    }
    else
    {
        //User has made following inputs...
        $url1 = $_POST['url1'];
        $url2 = $_POST['url2'];
        
        //Encode User's Url inputs. (Add rawurlencode(), urlencode() and intval() in user's submitted url where appropriate).
    }
    
    $encoded_url1 = encodedUrledited($url1);
    $encoded_url2 = encodedUrledited($url2);
    
    $link1 = '<a href=' .htmlspecialchars($encoded_url1) .'>' .htmlspecialchars($encoded_url1) .'</a>';
    $link2 = '<a href=' .htmlspecialchars($encoded_url2) .'>' .htmlspecialchars($encoded_url2) . '</a>';
    
    echo $link1; echo '<br/>';
    echo $link2; echo '<br/>';
    
    ?>
    

    These 2 following lines were supposed to be outside the ELSE. They weren't. Hence all the issue. Moved them outside the ELSE and now script working fine.

    $encoded_url1 = encodedUrledited($url1);
    $encoded_url2 = encodedUrledited($url2);