I have a log message like this
{"level":"INFO","location":"log_missing_items:xxx","message":"Missing_Items: Item1, Item2, Item3 ...","timestamp":"2025-01-02 21:46:15,382+0000","service":"xxxx","xray_trace_id":"xxxxxxxxx"}
I am trying to write a cloudwatch insights query to get only the missing items as rows so that it can be displayed in a cloudwatch insights query widget in a dashboard.
Expected result
Item1
Item2
Item3
is this possible with AWS cloudwatch Insights query ?
You can now use OpenSearch SQL language introduced in december 2024 during the AWS re:Invent 2024 to query your logs with Cloudwatch logs insight. You will find more details in this blog written by Elizabeth Fuentes and her team.
Here's an SQL
example to achieve your needs:
SELECT distinct get(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]'), b.index) as missingItem
FROM `LogGroupA` a
INNER JOIN (
SELECT ROW_NUMBER() OVER w - 1 index, MAX(regexp_count(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[^,]+')) OVER w max_missing_items
FROM `LogGroupA`
WINDOW w AS (ORDER BY id)
) b
WHERE get(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]'), b.index) <> ''
AND b.index < b.max_missing_items
regexp_extract(`@message`, 'Missing_Items: (.*)')
will extract the list of missing items for one log line. For example: Item1, Item2, Item3
.split(..., '[,]')
will build an array of the missing items from above: [Item1, Item2, Item3]
.get(..., b.index)
will get the b.index
element of the array. b.index
start from 0
to the maximum size of missing items array.SELECT
generates the sequence that will be used to get elements in the array of missing items. Note that you will need to have more logs line than the maximum number of missing items array. That should not be a problem unless your maximum number of missing items is big, in that case you should validate that value.regexp_count(..., '[^,]+')
will count the number of elements in the missing items list. Here Item1, Item2, Item3
will give 3
. I didn't find any useable size
functions to get the size of an array.That query could have been much easier if Generator functions were supported by CloudWatch Logs
. The explode function for example would do it directly (Something like SELECT distinct explode(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]')) as missingItem FROM `LogGroupA`
).
More details on supported OpenSearch SQL commands and functions here.