I have a bit of a thorny JSON manipulation problem. I have half a mind to just write a Python program to do it, but I’m wondering if a well-written jq
query can solve it more elegantly — partly for a cleaner solution, and partly for pedagogic purposes. (I’m a jq
noob and would love to take this opportunity to learn.)
I have the following JSON, printed from a tool whose output format I cannot modify:
[
{
"ExifTool:ExifTool:ExifTool": {
"ExifToolVersion": 12.76
},
"SourceFile": "./_DSC5848.JPG",
"File:System:Other": {
"FileName": "_DSC5848.JPG",
"Directory": ".",
"FileSize": "82 kB",
"FilePermissions": "-rw-r--r--"
},
"EXIF:ExifIFD:Camera": {
"ExposureProgram": "Aperture-priority AE",
"MaxApertureValue": 1.4,
"Sharpness": "Normal"
},
"File:System:Time": {
"FileModifyDate": "2024:09:24 14:10:16-07:00",
"FileAccessDate": "2024:09:28 00:13:26-07:00",
"FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
},
"EXIF:ExifIFD:Image": {
"ExposureTime": "1/50",
"FNumber": 4.0,
"ISO": 200
},
... additional arbitrary colon-keys ...
},
{ ... },
{ ... },
{ ... },
{ ... }
]
I need the keys containing colons (I’ll call them “colon-keys”) to be recursively “unrolled” such that "A:B:C": { ... }
becomes:
"A": {
"B": {
"C": { ... }
}
}
Colon-keys with identical prefixes would be merged. For example, if there is also a colon-key "A:B:D": { ... }
, the above would become:
"A": {
"B": {
"C": { ... },
"D": { ... }
}
}
Preserving the order of keys isn’t crucial, though it’d be cool if possible. It’s not known in advance what the names of the colon-keys will be, so hard-coding them unfortunately isn’t an option.
[Update about 12 hours after initial post to clarify how arrays behave] The input may be an array of objects (as given in the example), or it may be a single object.
In addition, this unrolling should recursively descend into arrays, so the following input:
{
"A:B": [
{
"C:D:E": { ... }
},
{
"C:D:F": { ... }
}
]
}
Would produce the following output:
{
"A": {
"B": [
{
"C": {
"D": {
"E": { ... }
}
}
},
{
"C": {
"D": {
"F": { ... }
}
}
}
]
}
}
Also notice from the above that the unrolling should not merge colon-keys across list elements; i.e., it should not produce the following:
It should NOT produce this:
{
"A": {
"B": [
{
"C": {
"D": {
"E": { ... },
"F": { ... }
}
}
}
]
}
}
So to circle back to the example from the beginning of this post, the output would look like:
[
{
"ExifTool": {
"ExifTool": {
"ExifTool": {
"ExifToolVersion": 12.76
}
}
},
"SourceFile": "./_DSC5848.JPG",
"File": {
"System": {
"Other": {
"FileName": "_DSC5848.JPG",
"Directory": ".",
"FileSize": "82 kB",
"FilePermissions": "-rw-r--r--"
},
"Time": {
"FileModifyDate": "2024:09:24 14:10:16-07:00",
"FileAccessDate": "2024:09:28 00:13:26-07:00",
"FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
}
}
},
"EXIF": {
"ExifIFD": {
"Camera": {
"ExposureProgram": "Aperture-priority AE",
"MaxApertureValue": 1.4,
"Sharpness": "Normal"
},
"Image": {
"ExposureTime": "1/50",
"FNumber": 4.0,
"ISO": 200
}
}
}
},
{ ... },
{ ... },
{ ... },
{ ... }
]
Is this possible to do with a well-written jq
query, or is my only option a hand-rolled program?
Bonus, would such a query be able to handle colon-keys of arbitrary length (A:B
, A:B:C
, A:B:C:D
, etc.) and at arbitrary levels of the JSON ("A:B:C": { "D:E": { ... } }
)?
Update: Solved: See @pmf’s solution, which is extremely elegant and very robust, as it’s able to handle arbitrarily-nested objects and lists in a very short query.
Break up the document into a stream of key-value pairs using tostream
, while discarding back-tracking items by select
ing only ones having a value (at position 1
). Then, re-arrange the path arrays by splitting strings (not numbers
) at colons. Eventually, re-construct the output object using setpath
.
reduce (tostream | select(has(1))) as $i (null;
setpath($i[0] | map(numbers // splits(":")); $i[1])
)
[
{
"ExifTool": {
"ExifTool": {
"ExifTool": {
"ExifToolVersion": 12.76
}
}
},
"SourceFile": "./_DSC5848.JPG",
"File": {
"System": {
"Other": {
"FileName": "_DSC5848.JPG",
"Directory": ".",
"FileSize": "82 kB",
"FilePermissions": "-rw-r--r--"
},
"Time": {
"FileModifyDate": "2024:09:24 14:10:16-07:00",
"FileAccessDate": "2024:09:28 00:13:26-07:00",
"FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
}
}
},
"EXIF": {
"ExifIFD": {
"Camera": {
"ExposureProgram": "Aperture-priority AE",
"MaxApertureValue": 1.4,
"Sharpness": "Normal"
},
"Image": {
"ExposureTime": "1/50",
"FNumber": 4.0,
"ISO": 200
}
}
}
}
]