jsonjq

Can jq unroll a list of objects by arbitrary key name?


I have a bit of a thorny JSON manipulation problem. I have half a mind to just write a Python program to do it, but I’m wondering if a well-written jq query can solve it more elegantly — partly for a cleaner solution, and partly for pedagogic purposes. (I’m a jq noob and would love to take this opportunity to learn.)

I have the following JSON, printed from a tool whose output format I cannot modify:

[
  {
    "ExifTool:ExifTool:ExifTool": {
      "ExifToolVersion": 12.76
    },
    "SourceFile": "./_DSC5848.JPG",
    "File:System:Other": {
      "FileName": "_DSC5848.JPG",
      "Directory": ".",
      "FileSize": "82 kB",
      "FilePermissions": "-rw-r--r--"
    },
    "EXIF:ExifIFD:Camera": {
      "ExposureProgram": "Aperture-priority AE",
      "MaxApertureValue": 1.4,
      "Sharpness": "Normal"
    },
    "File:System:Time": {
      "FileModifyDate": "2024:09:24 14:10:16-07:00",
      "FileAccessDate": "2024:09:28 00:13:26-07:00",
      "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
    },
    "EXIF:ExifIFD:Image": {
      "ExposureTime": "1/50",
      "FNumber": 4.0,
      "ISO": 200
    },
    ... additional arbitrary colon-keys ...
  },
  { ... },
  { ... },
  { ... },
  { ... }
]

I need the keys containing colons (I’ll call them “colon-keys”) to be recursively “unrolled” such that "A:B:C": { ... } becomes:

"A": {
  "B": {
    "C": { ... }
  }
}

Colon-keys with identical prefixes would be merged. For example, if there is also a colon-key "A:B:D": { ... }, the above would become:

"A": {
  "B": {
    "C": { ... },
    "D": { ... }
  }
}

Preserving the order of keys isn’t crucial, though it’d be cool if possible. It’s not known in advance what the names of the colon-keys will be, so hard-coding them unfortunately isn’t an option.


[Update about 12 hours after initial post to clarify how arrays behave] The input may be an array of objects (as given in the example), or it may be a single object.

In addition, this unrolling should recursively descend into arrays, so the following input:

{
  "A:B": [
    {
      "C:D:E": { ... }
    },
    {
      "C:D:F": { ... }
    }
  ]
}

Would produce the following output:

{
  "A": {
    "B": [
      {
        "C": {
          "D": {
            "E": { ... }
          }
        }
      },
      {
        "C": {
          "D": {
            "F": { ... }
          }
        }
      }
    ]
  }
}

Also notice from the above that the unrolling should not merge colon-keys across list elements; i.e., it should not produce the following:

It should NOT produce this:

{
  "A": {
    "B": [
      {
        "C": {
          "D": {
            "E": { ... },
            "F": { ... }
          }
        }
      }
    ]
  }
}

So to circle back to the example from the beginning of this post, the output would look like:

[
  {
    "ExifTool": {
      "ExifTool": {
        "ExifTool": {
          "ExifToolVersion": 12.76
        }
      }
    },
    "SourceFile": "./_DSC5848.JPG",
    "File": {
      "System": {
        "Other": {
          "FileName": "_DSC5848.JPG",
          "Directory": ".",
          "FileSize": "82 kB",
          "FilePermissions": "-rw-r--r--"
        },
        "Time": {
          "FileModifyDate": "2024:09:24 14:10:16-07:00",
          "FileAccessDate": "2024:09:28 00:13:26-07:00",
          "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
        }
      }
    },
    "EXIF": {
      "ExifIFD": {
        "Camera": {
          "ExposureProgram": "Aperture-priority AE",
          "MaxApertureValue": 1.4,
          "Sharpness": "Normal"
        },
        "Image": {
          "ExposureTime": "1/50",
          "FNumber": 4.0,
          "ISO": 200
        }
      }
    }
  },
  { ... },
  { ... },
  { ... },
  { ... }
]

Is this possible to do with a well-written jq query, or is my only option a hand-rolled program?

Bonus, would such a query be able to handle colon-keys of arbitrary length (A:B, A:B:C, A:B:C:D, etc.) and at arbitrary levels of the JSON ("A:B:C": { "D:E": { ... } })?


Update: Solved: See @pmf’s solution, which is extremely elegant and very robust, as it’s able to handle arbitrarily-nested objects and lists in a very short query.


Solution

  • Break up the document into a stream of key-value pairs using tostream, while discarding back-tracking items by selecting only ones having a value (at position 1). Then, re-arrange the path arrays by splitting strings (not numbers) at colons. Eventually, re-construct the output object using setpath.

    reduce (tostream | select(has(1))) as $i (null;
      setpath($i[0] | map(numbers // splits(":")); $i[1])
    )
    
    [
      {
        "ExifTool": {
          "ExifTool": {
            "ExifTool": {
              "ExifToolVersion": 12.76
            }
          }
        },
        "SourceFile": "./_DSC5848.JPG",
        "File": {
          "System": {
            "Other": {
              "FileName": "_DSC5848.JPG",
              "Directory": ".",
              "FileSize": "82 kB",
              "FilePermissions": "-rw-r--r--"
            },
            "Time": {
              "FileModifyDate": "2024:09:24 14:10:16-07:00",
              "FileAccessDate": "2024:09:28 00:13:26-07:00",
              "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
            }
          }
        },
        "EXIF": {
          "ExifIFD": {
            "Camera": {
              "ExposureProgram": "Aperture-priority AE",
              "MaxApertureValue": 1.4,
              "Sharpness": "Normal"
            },
            "Image": {
              "ExposureTime": "1/50",
              "FNumber": 4.0,
              "ISO": 200
            }
          }
        }
      }
    ]
    

    Demo