[SOLVED] How to select value from CSV containing JSON as a column

How to select value from CSV containing JSON as a column

I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but the actual file is flat with no newline chars.

name, 
{
"field1":
   {"name":"foo",
   "email":"test@gmail.com"},
"field2":{
     "subfield":{
          "subfield2":{
               "active":1,
               "passive":11,
               "running":111
                       }
                  }
           }
}

The input without pretty formatting…

name,{"field1":{"name":"foo","email":"test@gmail.com"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}

I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).

name, 
active passive running

The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.

Solution

If you run in your sample input

<input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name

you get

name
active passive running

The sample output you write, is wrong, because if it's a one field CSV, you do not have the , after name, there are no other fields

name, 
active passive running

But I probably didn't understand what you want