arraysjsoncsvjq

Convert JSON file to CSV using jq, expanding nested array data in multiple columns


I need to convert a JSON file to CSV in a bash script, this a sample file:

{
  "total": 1,
  "jobs": [
    {
      "deviceData": {
        "deviceId": "ad7bcba0",
        "name": "Device name",
        "serialNumber": "Serial",
        "productNumber": "Product",
        "type": "Type",
        "group": "Group",
        "organizationId": "OrgID",
        "site": {
          "siteId": "SiteID",
          "name": "Name",
          "primaryAddress": {
            "addressLine1": "Street 1",
            "locality": "City",
            "region": "Country",
            "country": "IT",
            "postalCode": "12345",
            "addressId": "67890",
            "name": "name"
          },
          "timezone": "Europe/Rome",
          "organizationId": "organization",
          "Region": "EMEA"
        }
      },
      "job_name": "Solo_Spot_bianco_120x60~3_p1_r1_c1",
      "product_name": "Product",
      "start_time": "2025-12-04T08:14:12.000+00:00",
      "end_time": "2025-12-04T08:31:26.000+00:00",
      "result": "Completed",
      "printed_copies": "1",
      "requested_copies": "1",
      "substrate_name": "vendor",
      "substrate_usage": {
        "value": 0.72451466,
        "unit": "squareMeters"
      },
      "print_mode": "Passes:17 - InkDensity:100 - InkDensityB:NA - ColorMode:CMYKcmW - WhiteMode:Spot - IsHeatSensitive:False - WhiteDensity:260",
      "paper_mode": "solo bianco",
      "dual_side": "No",
      "total_ink_usage": {
        "value": "109.850006",
        "unit": "milliliters"
      },
      "ink_usage_by_channel": [
        {
          "color": "LM",
          "consumption": "1.870000"
        },
        {
          "color": "OP",
          "consumption": "7.900001"
        },
        {
          "color": "C",
          "consumption": "1.860000"
        },
        {
          "color": "OC",
          "consumption": "9.000000"
        },
        {
          "color": "LC",
          "consumption": "1.870000"
        },
        {
          "color": "W",
          "consumption": "81.490005"
        },
        {
          "color": "Y",
          "consumption": "2.140000"
        },
        {
          "color": "M",
          "consumption": "1.860000"
        },
        {
          "color": "MK",
          "consumption": "1.860000"
        }
      ],
      "printing_time": "1034",
      "side_used": "Off",
      "isPrinterRoutine": false
    }
  ]
}

If I use this command in a bash script:

jq -r '.jobs[] | [ .job_name, .product_name, .printing_time ] | @csv' file.json

I obtain this result:

"Solo_Spot_bianco_120x60~3_p1_r1_c1","Product","1034"

But I don't know how to extract the nested array variable .ink_usage_by_channel.color and .consumption, my goal is to obtain this result:

"Solo_Spot_bianco_120x60~3_p1_r1_c1","Product","1034","LM","1.870000","OP","7.900001","C","1.860000"

Please note that the numbers of item in the nested array is fixed, I will always find 9 couple of color and consumption items.

I know that if I do this:

jq -r '.jobs[0].ink_usage_by_channel[] | [ .color, .consumption ] | @csv' file.json

I obtain this:

"LM","1.110000"
"OP","3.660000"
"C","0.210000"
"OC","5.330000"
"LC","1.390000"
"W","26.090002"
"Y","1.150000"
"M","0.200000"
"MK","0.200000"

But I would like to expand this array in columns. Is it possible?


Solution

  • how to extract the nested array variable .ink_usage_by_channel.color and .consumption, my goal is to obtain this result:

    "Solo_Spot_bianco_120x60~3_p1_r1_c1","Product","1034","LM","1.870000","OP","7.900001","C","1.860000"
    

    Do you want to interate over and extract only the first three records of the .ink_usage_by_channel array? Use .[:3][]

    .jobs[] | [.job_name, .product_name, .printing_time, (
      .ink_usage_by_channel[:3][] | .color, .consumption
    )] | @csv
    
    

    Demo

    Or du you only want to iterate over and extract the records of colors LM, OP, and C? Create an INDEX with the color names as references:

    .jobs[] | INDEX(.ink_usage_by_channel[]; .color) as $ink
    | [.job_name, .product_name, .printing_time, (
        $ink["LM", "OP", "C"] | .color, .consumption
      )]
    | @csv
    

    Demo

    For this input, both produce:

    "Solo_Spot_bianco_120x60~3_p1_r1_c1","Product","1034","LM","1.870000","OP","7.900001","C","1.860000"