I have JSONC as input (it is a superset of JSON that supports comments like //this one
and /* this one */
), and want to transform it into normal JSON (standard) using Python regex, but I'm not sure this can be solved with regex only. I know it can be done via semantic processing, maybe something like tree-sitter, but I'm looking for a regex-based solution. Since we don't use /* */
it's fine to have a regex only with removing comments with //.
Note that:
Here is an example input with a failing sed attempt at the top:
//tried this sed -r 's#\s//[^}]*##'
// also tried this '%^*3//s39()'
[
{
"test1" : "http://test.com",
"test2" : "http://test.com",//test
// any thing
"ok" : 3, //here 2
"//networkpath1" : true, //whynot
"//networkpath2" : true
// ok
},//eof
{
"statement" : "I like test cases"
}//eof
]
Here is another failing attempt:
comment_re = re.compile(r'\s//[^}]*')
cleaned = comment_re.sub('', jsonStr)
This removes too much when //
occurs in a string literal.
How can I make this work also for such inputs?
NB: A solution is already helpful if it doesn't deal with /* this type of comments */
so no need to cover for that.
You could match quoted strings as a capture group and re-inject those in the result, so to avoid that you would match any of the comment delimiters in such strings:
comment_re = re.compile(
r'//.*|/\*[\s\S]*?\*/|("(\\.|.)*?")', # capture group for quoted strings
)
cleaned = comment_re.sub(r'\1', jsonStr) # re-inject quoted strings
Here it is not a requirement that the JSONC input be formatted with specific indentation and line separators.