I have a jCal JSON array which I'd like to filter with jq. JSON arrays are somewhat new to me and I have been banging my head to the wall on this for hours...
The file looks like this:
[
"vcalendar",
[
[
"calscale",
{},
"text",
"GREGORIAN"
],
[
"version",
{},
"text",
"2.0"
],
[
"prodid",
{},
"text",
"-//SabreDAV//SabreDAV//EN"
],
[
"x-wr-calname",
{},
"unknown",
"Call log private"
],
[
"x-apple-calendar-color",
{},
"unknown",
"#ffaa00"
],
[
"refresh-interval",
{},
"duration",
"PT4H"
],
[
"x-published-ttl",
{},
"unknown",
"PT4H"
]
],
[
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-02-18T16:44:04Z"
],
[
"uid",
{},
"text",
"9b23142b-8d86-3e17-2f44-2bed65b2e471"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +49xxxxxxxxxx lasted for 0 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +49xxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-02-18T10:58:12Z"
],
[
"dtend",
{},
"date-time",
"2015-02-18T10:58:44Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-01-09T19:12:05Z"
],
[
"uid",
{},
"text",
"c337e092-a012-5f5a-497f-932fbc6159e5"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +1xxxxxxxxxx lasted for 39 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +1xxxxxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-01-09T17:23:16Z"
],
[
"dtend",
{},
"date-time",
"2015-01-09T17:24:19Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
]
]
I would like to filter out dtstart, dtend, the target phone number and the connection duration from the description for each vevent which was created e.g. in January 2019 ("2019-01.*") and output them as a CSV.
This JSON is a bit strange because the information is stored position-based in an array instead of an object. Using the first element of an array ("vevent") to identify its contents is not the best practice.
But anyway ... if this is the data source you are dealing with, this code should help you.
jq -r '..
| arrays
| select(.[0] == "vevent")[1]
| [
(.[] | select(.[0] == "dtstart") | .[3]),
(.[] | select(.[0] == "dtend") | .[3]),
(.[] | select(.[0] == "description") | .[3])
]
| @csv
'
Alternatively, the repeating code can be transferred into a function
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [ getField("dtstart"; 3), getField("dtend"; 3), getField("description"; 3) ]
| @csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","Phone call to +49xxxxxxxxxx lasted for 0 seconds."
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
You can also extract phone number and duration with the help of regular expressions in jq
:
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [
getField("dtstart"; 3),
getField("dtend"; 3),
(getField("description"; 3) | match("call to ([^ ]*)") | .captures[0].string),
(getField("description"; 3) | match("(\\d+) seconds") | .captures[0].string)
]
| @csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","+49xxxxxxxxxx","0"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","+1xxxxxxxxxx","39"