I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but the actual file is flat with no newline chars.
name,
{
"field1":
{"name":"foo",
"email":"test@gmail.com"},
"field2":{
"subfield":{
"subfield2":{
"active":1,
"passive":11,
"running":111
}
}
}
}
The input without pretty formatting…
name,{"field1":{"name":"foo","email":"test@gmail.com"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}
I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).
name,
active passive running
The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.
If you run in your sample input
<input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name
you get
name
active passive running
The sample output you write, is wrong, because if it's a one field CSV, you do not have the ,
after name, there are no other fields
name,
active passive running
But I probably didn't understand what you want