javascriptregexstringxpagessanitization

Is there a better way to sanitize input with javascript?


I wanted to write a javascript function to sanitize user input and remove any unwanted and dangerous characters.

It must allow only the following characters:

My first attempt was:

function sanitizeString(str){
str = str.replace(/[^a-z0-9áéíóúñü_-\s\.,]/gim,"");
return str.trim();
}

But if I did:

sanitizeString("word1\nword2")

it returns:

"word1
word2"

So I had to rewrite the function to remove explícitly \t\n\f\r\v\0:

function sanitizeString(str){
str = str.replace(/([^a-z0-9áéíóúñü_-\s\.,]|[\t\n\f\r\v\0])/gim,"");
return str.trim();
}

I'd like to know:

  1. Is there a better way to sanitize input with javascript?
  2. Why \n and \t doesn't matches in the first version RegExp?

Solution

  • The new version of the sanitizeString function:

    function sanitizeString(str){
        str = str.replace(/[^a-z0-9áéíóúñü \.,_-]/gim,"");
        return str.trim();
    }
    

    The main problem was mentioned by @RobG and @Derek: (@RobG write your comment as an answer and I will accept it) \s doesn't mean what now w3Schools says

    Find a whitespace character

    It means what MDN says

    Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000].

    I trusted in w3Schools when I wrote the function.

    A second change was to move the dash character (-) to the end in order to avoid it's range separator meaning.