jsonjqdicomdcmtk

How to filter out elements with a particular property (or keep elements without that property)


I'm processing the output from dcm2json, which converts the metadata from medical imaging data in the DICOM format to JSON. The values for this metadata is mostly strings, integers, floats, and similar, but it also includes inline binary values in the form of base64-encoded strings. We don't need those binaries and they can get pretty large, so I need to filter out all metadata elements that have an InlineBinary property. Below is a (very simple small) sample of the JSON output from dcm2json:

{
    "00080005": {
        "vr": "CS",
        "Value": ["ISO_IR 192"]
    },
    "00291010": {
        "vr": "OB",
        "InlineBinary": "Zm9vYmFyCg=="
    }
}

I want to transform this to:

{
    "00080005": {
        "vr": "CS",
        "Value": ["ISO_IR 192"]
    }
}

I tried a few different things that didn't work but finally ended up using this:

$ dcm2json file.dcm | jq '[to_entries | .[] | select(.value.Value)] | from_entries'

I kept playing with it though because I don't like having that expression embedded in the array (i.e. [to_entries ...]). I came up with something a bit more elegant, but I'm totally stumped as to why it works the way it does:

jq 'to_entries | . - map(select(.value | has("InlineBinary") == true)) | from_entries' | less

What's confusing is the has("InlineBinary") == true bit. I first ran this comparing it to false because what I wanted was those elements that don't have the InlineBinary property. Why does it work seemingly opposite what I think I'm requesting? Given that I really don't understand what's happening with the . - map(...) structure in there (I totally swiped it from another post where someone asked a similar question), I'm not surprised it does something I don't understand but I'd like to understand why that is :)

The other thing I'm confused about is to_entries/from_entries/with_entries. The manual says about these:

with_entries(foo) is a shorthand for to_entries | map(foo) | from_entries

Cool! So that would be:

jq 'with_entries(map( . - map(select(.value | has("InlineBinary") == true))))'

But that doesn't work:

$ cat 1.json | jq 'with_entries(map(. - map(select(.value | has("InlineBinary") == true))))'
jq: error (at <stdin>:848): Cannot iterate over string ("00080005")

Given that this statement is supposed to be functionally equivalent, I'm not sure why this wouldn't work.

Thanks for any info you can provide!


Solution

  • When selecting key-value pairs, with_entries is often the tool of choice:

    with_entries( select(.value | has("InlineBinary") | not) )