I am writing a bash script to create a csv from a sample inconsistent json data but running into a bunch of issues :
My json data looks like this :
json_data='[
{
"ResourceARN": "arn",
"Tags": [
{
"Key": "STAGE",
"Value": "st"
}
]
},
{
"ResourceARN": "arn",
"Tags": [
{
"Key": "STAGE",
"Value": "st"
}
]
},
{
"ResourceARN": "arn",
"Tags": [
{
"Key": "aws:cloudformation:stack-name",
"Value": "aud"
},
{
"Key": "CostCenter",
"Value": "Development"
},
{
"Key": "aws:cloudformation:stack-id",
"Value": "arn"
},
{
"Key": "STAGE",
"Value": "ec"
},
{
"Key": "Environment",
"Value": "development"
},
{
"Key": "Region",
"Value": "us-west-2"
},
{
"Key": "Service",
"Value": "MP"
},
{
"Key": "aws:cloudformation:logical-id",
"Value": "ApiGatewayRestApi"
},
{
"Key": "Team",
"Value": "VNP"
}
]
} and so on
]'
I am trying to create a csv with the following column names : ARN, Stage, CostCenter, Service, Domain, Team and the values should be what is found in the key value pairs of the Tags array and whatever key value pairs do not exist then just leave the column as blank.
My code looks like this :
# Define the columns
columns=("ARN" "STAGE" "CostCenter" "Service" "Domain" "Team")
# Create CSV header
header=$(IFS=','; echo "${columns[*]}")
# Function to get value for a specific key in the Tags array
get_value() {
key="$1"
tags="$2"
value=$(echo "$tags" | jq -r --arg key "$key" '.[] | select(.Key == $key) | .Value')
if [ "$value" == "null" ]; then
echo ""
else
echo "$value"
fi
}
# Create the CSV data
csv_data=""
for item in "${json_data[@]}"; do
resource_arn=$(echo "$item" | jq -r '.ResourceARN')
row="$resource_arn"
for col in "${columns[@]}"; do
value=$(get_value "$col" "$(echo "$item" | jq -c '.Tags')")
row="$row,$value"
done
csv_data="$csv_data$row\n"
done
# Output the CSV to a file
echo -e "$header\n$csv_data" > output.csv
echo "CSV data has been written to output.csv"
But i keep running into
jq: error (at <stdin>:57): Cannot index array with string "ResourceARN"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
Basically the final CSV should look like
ARN STAGE CostCenter Service Domain Team
arn st. "" "" "" ""
arn st "" "" "" ""
arn ec Development MP. "" VNP
....
Can someone please point out what i'm doing wrong or if there is a simple way of doing it? i would imagine jq
would simply leave the values blank and output the ones having values? Thank you.
Since you state you wish to produce CSV, let me start with a filter that does just that:
["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| $columns,
(.[]
| .ResourceARN as $ARN
| .Tags | from_entries
| [$ARN, .[ $columns[1:][]]] )
| @csv
Your illustrative output seems to correspond more closely to TSV, and you indicate you want missing fields to be represented by ""
, so we could tweak the above accordingly:
["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| "\"\"" as $q
| $columns,
(.[]
| .ResourceARN as $ARN
| .Tags | from_entries
| [$ARN, (.[ $columns[1:][]] | (. // $q ) ) ] )
| @tsv
(The trick here is using STREAM | (. // $q)
instead of STREAM // $q
.)
With your sample input, this produces:
ARN STAGE CostCenter Service Domain Team
arn st "" "" "" ""
arn st "" "" "" ""
arn ec Development MP "" VNP
You might wish to consider other variants, e.g. using join(",")
instead of @csv
or @tsv
.