javascriptregexcharacter-class

How word character is interpreted in character class?


\w - stands for [A-Za-z0-9_] Character class

But i am not able to understand how it is interpreted inside character class.

So when i use

[\w-~]

let test = (str) => /^[\w-~]+$/.test(str)

console.log(test("T|"))

it fails for T|

but when i am using

[A-Za-z0-9_-~]

let test = (str) => /^[A-Za-z0-9_-~]+$/.test(str)

console.log(test("T|"))

it results in true,

i am not able to understand how these two expressions are different from each other ?


Solution

  • I believe that the main difference between both your examples is the location of your - character. What's happening here is that in this example:

    let test = (str) => /^[A-Za-z0-9_-~]+$/.test(str)
    
    console.log(test("T|"))
    

    It's evaluated as a range, like so:

    let test = (str) => /^[_-~]+$/.test(str)
    
    console.log(test("|"))
    

    will return true.

    Where in this one:

    let test = (str) => /^[\w-~]+$/.test(str)
    
    console.log(test("T|"))
    
    

    Since \w is a set of characters in and of itself, it's evaluating the character - by itself.

    The position of - and it's surrounding can make a huge difference in how it's interpreted.

    You could avoid that situation, altogether, by moving it to the end, like so:

    let test = (str) => /^[A-Za-z0-9_~-]+$/.test(str)
    
    console.log(test("T|"))
    

    which will return false