javascriptregexsvgraphaelsnap.svg

Need help extracting numbers from string in JavaScript


I need a rock solid RegExp to try and solve some issue with Raphael.js parseStringPath processing regarding Arc path commands and possible others (SnapSVG also inherits the problem). You see, arcTo path command accepts 7 coordinates and settings, but some strings might be malformed due to extreme optimization and the browser doesn't flag them, rather renders them properly. Check Raphael.js demo here.

Have a look at this example, I'm using the RegExp from Raphael.js and a very simplistic example with my own RegExp called incorrectReg, trying to break strings like 000 into [0,0,0] or 011 into [0,1,1].

let spaces = "\x09\x0a\x0b\x0c\x0d\x20\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u202f\u205f\u3000\u2028\u2029",
    pathValues = new RegExp(`(-?\\d*\\.?\\d*(?:e[\\-+]?\\d+)?)[${spaces}]*,?[${spaces}]*`, `ig`),
    incorectReg = new RegExp(`([${spaces}]*0(?=[a-z0-9])|([${spaces}]\\0)*0(?=[a-z0-9]*))`, `ig`); // THIS ONE

function action(){
  let input = document.getElementById('input'),
      output = document.getElementById('output'),
      pathValue = input.getAttribute('d'),
      segments = pathValue.replace(/([a-z])/gi,'|$1').split('|').filter(x=>x.trim()),
      pathArray = []
      
  segments.map(x=>{
    let pathCommand = x[0],
        pathParams = x.replace(pathCommand,'').trim()
        
    pathArray.push( [pathCommand].concat(
      pathParams.replace(',',' ')
                .replace(pathValues,' $1 ')
                .replace(incorectReg,'$1 ')
                .split(' '))
                .filter(x=>x)
    );
  })
  output.setAttribute('d',pathArray.map(x=>x.join(' ')).join(''))

  console.table(pathArray)
}
svg {max-width:49%}
<button onclick="action()">Extract</button>
<hr>
<svg viewBox="0 0 16 16">
  <path id="input" d="M2,0a2 2 0 00,-2 2a2 2 0 002 2a.5.5 0 011 0z" stroke="red" stroke-width="1px" fill="none"></path>
</svg>

<svg viewBox="0 0 16 16">
  <path id="output" d="M0 0" stroke="green" stroke-width="1" fill="none"></path>
</svg>

As you can see in your browser console, we already solve the 000 group (which is obviously not a valid number, boolean, or anything specific), we just have to solve 011 and 11, where all these groups are in fact a string of booleans.

So again, the arcTo path command works with

arcTo -> ['A', rx,    ry,    xAxisRotation, largeArcFlag,  sweepFlag,     x,     y]
       // str, float, float, float,         boolean (0|1), boolean (0|1), float, float

I need a better incorrectReg RegExp and a combination of solutions to properly handle mainly arcTo, and other similar cases. Open to any suggestion.

Thank you


Solution

  • According to the discussion below OP, I propose not to use regexp, but rather a proper parser (or lexer or tokenizer or how to correctly call it).

    You can

    I am not even sure if such "super"regexp is possible to create.. Anyway you can use "sub"regexp in the parsing process :-)