phpcakephpsanitization

CakePHP - Sanitize::clean leaving &amp and \n


I'm getting a little confused with what exactly cakePHPs Sanitize::clean() method should do. Currently when I'm adding a record I'm doing this:

$this->request->data = Sanitize::clean($this->request->data, array('encode' => true, 'remove_html' => true));

However, this still leaves & and \n in my database when they use & and press enter in a textarea. How can I stop this? I thought remove_html => true would have done this?

Do I need to go as far as doing a str_replace()

Also some of the records with the \n 's in them have hundreds of trailing backslashes meaning the break any views they are displayed on.

Could someone point me in the right direction? Thanks

Update as per Nunsers answer

I've now added the following after the clean:

foreach ($this->request->data['Expense'] as &$expense) {
    $expense['detail'] = Sanitize::stripWhitespace($expense['detail']);         
}
unset($expense);

However, it does remove whitespace but still leaves lots of \n\n\n\n\n\

Heres a debug of $this->request->data:

array(
    'submit' => 'Save',
    'Expense' => array(
        (int) 0 => array(
            'date' => array(
                'day' => '27',
                'month' => '06',
                'year' => '2013'
            ),
            'sitename' => 'test',
            'detail' => 'test test\n\ntest\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
            'ExpenseCode' => array(
                'ExpenseCode' => '1'
            ),
            'billable' => '1',
            'amount' => '1',
            'miles' => '',
            'total' => '1',
            'billable_mileage' => '',
            'recorded_global_mileage_rate' => '0.4'
        ),
        (int) 1 => array(
            'date' => array(
                'day' => '27',
                'month' => '06',
                'year' => '2013'
            ),
            'sitename' => 'test',
            'detail' => '\n\n\n\n\n\n\n\n\n\n\n\n\n\ntest',
            'ExpenseCode' => array(
                'ExpenseCode' => '4'
            ),
            'billable' => '1',
            'amount' => '10',
            'miles' => '',
            'total' => '10',
            'billable_mileage' => '',
            'recorded_global_mileage_rate' => '0.4'
        )
    ),
    'CashFloat' => array(
        'amount' => ''
    ),
    'ExpenseClaim' => array(
        'user_id' => '3',
        'claim_status_id' => '1'
    )
)

I'd like to strip thouse \n's out really as I dont want them break views.

More results

Even when I cut out the cake function and use its code directly inline via :

$expense['detail'] = preg_replace('/\s{2,}/u', ' ', preg_replace('/[\n\r\t]+/', '', $expense['detail']));

I still get the same (debug($expense['detail']) from the loop:

'test 10 spaces before this then pressing enter lots \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'

I've also tried just a basic trim() which didnt work at all.

Working solution (apart from the &)

This will remove any number of

\n

from the string

foreach ($this->request->data['Expense'] as &$expense) {

    $expense['detail'] = str_replace("\\n", ' ', $expense['detail']);
    $expense['detail'] = Sanitize::stripWhitespace($expense['detail']);             
}
// Unset referance var from above loop
unset($expense);

Decided to keep the &

And just use html_entity_decode() when echoing it out in a view

Hope that helps someone!


Solution

  • According to the docs, the clean method does not do what you want.

    First, for the \n characters, you should call another Sanitize function. Sanitize::clean() has a carriage option, but that only removes the \r characters. So, after (or before) calling the clean method, call stripWhitespaces. Unfortunately, this functions only receives a string, so you need to call it in a loop (or if you know the string you want to sanitize, use it for just that).

    $cleanOfSpacesString = Sanitize::stripWhitespace($dirtyString);
    

    That function removes this characters: \r, \n and \t, and replace 2 or more spaces with just one.

    And for &, the docs says that remove_html removes html tags and also does a htmlentities($string, ENT_QUOTES, $defaultCharset, true) to the string. So if that htmlentities is not working for you, you'll have to use another function not incuded in the Sanitize class.

    If you think this behaviour is something you want to have inside the helper, extend the Sanitize helper to include stripWhiteSpaces inside the clean function and to also replace the htmlentities function with something that works for you. If it's just for this case, add those functions in the controller, after calling Sanitize::clean


    Update as per Jason's update to my answer

    I found it weird that the stripWhitespaces function didn't work as you wanted. And here is the explanation of why I think it didn't.

    You want to remove \n from a string like this

    'test test\n\ntest\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'
    

    and the preg_replace inside stripWhitespaces is this

    preg_replace('/[\n\r\t]+/', '', $str)
    

    That regex will remove newlines, returns and tabs, but on a string like this

    'test test
    
    test
    
    ' //and a lot more newlines
    

    So the difference here is that you're passing the newlines (maybe the tabs and returns also, I don't know) as strings, real\+n, and so, the preg_replace cake does is not useful for you. I know you found a solution for that (I recommend you add the tab and return match also in there), but I wanted to clarify why cake's function didn't work in case someone else has a similar problem.