I have a requirement to fetch all records from an amazon QLDB based on the given year.
Here is my data inside the Revenues Table.
{
ownerId: "u102john2021",
transactionId: "tran010101010101",
timeStamp: 2021-06-11T19:31:31.000Z
}
{
ownerId: "u102john2021",
transactionId: "tran010101010101",
timeStamp: 2020-06-11T19:31:31.000Z
}
If I pass the year 2020 I want to select relevant records.
How can I write a select query on this?
To immediately answer your question, there are a couple of ways that you can achieve what you're trying to do, based on the ION data type of the timeStamp field.
1/ If the data type is of the timestamps type i.e
{
'ownerId' : 'A',
'transactionId' : 't1',
'timeStamp' : `2021-06-11T19:31:31.000Z`
},
{
'ownerId' : 'B',
'transactionId' : 't2',
'timeStamp' : `2020-06-11T19:31:31.000Z`
}
You can use a WHERE
clause that sets the boundaries of the SELECT
statement i.e
SELECT * FROM revenues WHERE "timeStamp" < `2021T` AND "timeStamp" >= `2020T`
Note that I've placed the timeStamp field in double quotation marks because it is a reserved keyword: https://docs.aws.amazon.com/qldb/latest/developerguide/ql-reference.reserved.html.
2/ If the data type is of the string type i.e
{
'ownerId' : 'C',
'transactionId' : 't3',
'timeStamp' : '2021-06-11T19:31:31.000Z'
},
{
'ownerId' : 'D',
'transactionId' : 't4',
'timeStamp' : '2020-06-11T19:31:31.000Z'
}
You can use a WHERE
clause with the LIKE
operator to match a pattern i.e
SELECT * FROM revenues WHERE "timeStamp" LIKE "2020%"
I'd like to mention that though these queries will achieve what you want them to, they are not optimised for QLDB and as the size of the data set grows, there will be significant performance problems in the form of query latency, transaction timeouts, and concurrency conflicts. The reason for this is that QLDB performs a full table scan unless a predicate with an equality check against an indexed field is provided e.g
SELECT * FROM revenues WHERE "timeStamp" = `2021-06-11T19:31:31.000Z`
Scan queries face high latency that increases with the amount of data that has to be examined. The queries provided will result in scans in order to determine the right documents to return that fit the ranges.
With the increase in latency, another aspect that has to be considered is the QLDB transaction timeout of 30 seconds. All queries in QLDB are transactions with serializable isolation, including SELECT
statements. As the scan latency goes up with increase in the data set, the transaction timeout will inevitably be triggered and the query will error.
Ideally, you should run statements with a WHERE
predicate clause that filters on an indexed field or a document ID. For more information on optimal queries for QLDB, please see: https://docs.aws.amazon.com/qldb/latest/developerguide/working.optimize.html.
For running such scans as provided above, we recommend streaming the data to a purpose-built database service of your choice that is optimized for analytical use cases.