Summary:
I need to filter PII data from a complex JSON object in one-pass with JQ. I'm not in scripting where deconstruct and/or multi-pass. I want to retain non-PII properties rather than remove PII properties because I want to avoid a PII leakage exposure if the backend adds a new PII property and doesn't let me know.
In simple cases, I can easily "reconstruct" the desired JSON object from the input object like this:
{
"data": {
"id": "123",
"pii": "sensitve"
}
"return-code": 200
}
jq '{data: {id: .data.id }, return-code: .return-code}'
Once arrays are added to the mix, I'm not seeing how to use this approach to solve this problem.
Simplified Example of the Complex Object
Input:
{
"customers": [
{
"id": "00000000001",
"dateOfBirth": "sensitive DOB",
"preferences": [
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
]
},
{
"id": "00000000002",
"dateOfBirth": "sensitive DOB",
"preferences": [
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
}
]
}
Desired Output:
{
"customers": [
{
"id": "00000000001",
"preferences": [
{
"preference-id": "0001"
},
{
"preference-id": "0002"
}
]
},
{
"id": "00000000002",
"preferences": [
{
"preference-id": "0003"
},
{
"preference-id": "0004"
}
]
}
]
}
Attempted Approach with Arrays:
jq '{ customers: [ { id: .customers[].id, preferences: [ .customers[].preferences ] } ]}'
Result starts pulling together permutations across different customers
{
"customers": [
{
"id": "00000000001",
"preferences": [
[
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
],
[
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
]
},
{
"id": "00000000002",
"preferences": [
[
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
],
[
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
]
}
]
}
I really don't think this approach will work at all and I'm at a loss for other approaches. This is a simplified example, the actual JSON is quite large with many arrays at various nested levels.
Any suggestions for what approach I might investigate?
Using a user function for picking specific paths from a JSON:
def pick(paths):
. as $in
| reduce path(paths) as $path (null;
setpath($path; $in | getpath($path))
);
pick(.customers[] | .id, .preferences[]."preference-id")