I need a rock solid RegExp to try and solve some issue with Raphael.js parseStringPath
processing regarding Arc path commands and possible others (SnapSVG also inherits the problem). You see, arcTo
path command accepts 7 coordinates and settings, but some strings might be malformed due to extreme optimization and the browser doesn't flag them, rather renders them properly. Check Raphael.js demo here.
Have a look at this example, I'm using the RegExp from Raphael.js and a very simplistic example with my own RegExp called incorrectReg
, trying to break strings like 000
into [0
,0
,0
] or 011
into [0
,1
,1
].
let spaces = "\x09\x0a\x0b\x0c\x0d\x20\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u202f\u205f\u3000\u2028\u2029",
pathValues = new RegExp(`(-?\\d*\\.?\\d*(?:e[\\-+]?\\d+)?)[${spaces}]*,?[${spaces}]*`, `ig`),
incorectReg = new RegExp(`([${spaces}]*0(?=[a-z0-9])|([${spaces}]\\0)*0(?=[a-z0-9]*))`, `ig`); // THIS ONE
function action(){
let input = document.getElementById('input'),
output = document.getElementById('output'),
pathValue = input.getAttribute('d'),
segments = pathValue.replace(/([a-z])/gi,'|$1').split('|').filter(x=>x.trim()),
pathArray = []
segments.map(x=>{
let pathCommand = x[0],
pathParams = x.replace(pathCommand,'').trim()
pathArray.push( [pathCommand].concat(
pathParams.replace(',',' ')
.replace(pathValues,' $1 ')
.replace(incorectReg,'$1 ')
.split(' '))
.filter(x=>x)
);
})
output.setAttribute('d',pathArray.map(x=>x.join(' ')).join(''))
console.table(pathArray)
}
svg {max-width:49%}
<button onclick="action()">Extract</button>
<hr>
<svg viewBox="0 0 16 16">
<path id="input" d="M2,0a2 2 0 00,-2 2a2 2 0 002 2a.5.5 0 011 0z" stroke="red" stroke-width="1px" fill="none"></path>
</svg>
<svg viewBox="0 0 16 16">
<path id="output" d="M0 0" stroke="green" stroke-width="1" fill="none"></path>
</svg>
As you can see in your browser console, we already solve the 000
group (which is obviously not a valid number, boolean, or anything specific), we just have to solve 011
and 11
, where all these groups are in fact a string of booleans.
So again, the arcTo
path command works with
arcTo -> ['A', rx, ry, xAxisRotation, largeArcFlag, sweepFlag, x, y]
// str, float, float, float, boolean (0|1), boolean (0|1), float, float
I need a better incorrectReg
RegExp and a combination of solutions to properly handle mainly arcTo
, and other similar cases. Open to any suggestion.
Thank you
According to the discussion below OP, I propose not to use regexp, but rather a proper parser (or lexer or tokenizer or how to correctly call it).
You can
I am not even sure if such "super"regexp is possible to create.. Anyway you can use "sub"regexp in the parsing process :-)