phphtml-encodehtmlspecialchars

Sanitizing request body doing a POST request to API


I would like to use htmlspecialchars to sanitize data before doing a POST request but keep getting the error:

url=*** - Uncaught TypeError: http_build_query(): Argument #1 ($data) must be of type array, string given

This is the function related to this error and how it is getting triggered:

function makePostRequest($baseURL) {
    $ch = curl_init();
    $clean_post =  htmlspecialchars($POST);
    $data = http_build_query($clean_post);
    curl_setopt($ch, CURLOPT_URL, $baseURL);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $response = curl_exec($ch);
    curl_close($ch);

    if($e = curl_error($ch)) {
        echo $e;
    } else {
        $json = json_decode($response, true);
        return print_r($json);
    }
}
...
$response = "";
switch (getRequestMethod()) {
  case 'GET':
    $response = makeGetRequest($baseURL);
    break;
  case 'POST':
    $response = makePostRequest($baseURL);
    break;
  default:
    echo "There has been an error";
    return;
}

This is a sample of the data I am sending as part of the POST request:

    data = {
        name:'***',
        password: '***',
        userID: emailAddress,
        userSecret: password
    }
    console.log('data', data)
    jQuery.ajax({
        type: "POST",
        url: "proxy.php?url=***",
        dataType: "json",
        contentType: 'application/x-www-form-urlencoded',
        data: data,
        success: function (data){
            console.log('success', data)
        }
    });
});

Solution

  • The actual error here is because htmlspecialchars returns a string (rather than an array), but http_build_query expects you to give it an array, as the error message points out. However, there's no point trying to fix it directly, because you shouldn't be doing this to begin with.

    Your usage of htmlspecialchars() is inappropriate and potentially problematic. htmlspecialchars() is an output filter, only to be used specifically when outputting data into a HTML document. It is designed only to help protect against XSS attacks - which are something that can only occur in a HTML document loaded into a web browser with JavaScript enabled.

    It should not be used at any other time, such as when receiving input data -at worst it can change or corrupt your data unnecessarily in that situation. It also has nothing to do with sending in a HTTP request either. See also when to use htmlspecialchars() function?.

    You're not writing this data to a HTML document which is going to be displayed in a browser so there is no need to HTML-encode the data, or try to "sanitise" it against anything else (e.g. SQL injection as you mentioned in the comments) that you aren't directly using it for.

    You also have no idea whether the 3rd party whose API you are sending it to will try to put any of the data you provide into either a HTML or SQL context, or anything else. If they don't, there's nothing for anyone to worry about. And if they do, then it's their responsibility to deal with the data accordingly at the right moment. They should be treating your application as a potential threat - you're providing input data they don't control, and they don't know how it got there or where it came from, or how you've processed it.

    If you prematurely HTML-encode data which isn't going anywhere near a HTML document then you simply risk corrupting it (e.g. imagine I used the character < in my password, which ought to be a perfectly legitimate thing to do...using htmlspecialchars on that would alter it without my knowledge, meaning I don't know my real password anymore, and the alteration wouldn't achieve anything useful). And you physically cannot sanitise for SQL injection here, because that involves writing parameterised queries, and in this case you're not the one writing the query code.


    P.S.

    I actually once created an account with a large commercial organisation which provides services to the general public, and they silently stripped a # character from the password I provided at registration, meaning I couldn't log in properly. Knowing what I know about these processes, I eventually guessed what had happened and tried the same password without that character and it logged me in. So people actually do this stuff in real life and it causes real problems - anyone without relevant experience would have no idea why it wasn't working.

    I reported it to their helpdesk and their developers initially had no idea what I was referring too, which was worrying in itself, for an organisation of that size not to have the understanding of what they were doing to the data. Eventually I think they fixed it, but I leave this anecdote here just to demonstrate that these are real issues, and not just technical pedantry.