I'm wondering the best way to design tables in QLDB and whether it's best to perform joins or perhaps have nested documents.
For example, if I have the tables transaction
and payment
where a payment must be associated to a transaction. Which of the following options are best;
Nested Document Option (One table)
{
'payment_reference': 'abc123',
'transaction': {
'id': 123,
'name': 'John Doe',
'amount': '$10'
},
'fees': '$2',
'amount_paid': '$12'
}
Two Table Option
Payment Document
{
'payment_reference': 'abc123',
'transaction_id': 12,
'fees': '$2',
'amount_paid': '$12'
}
Transaction Document
{
'id': 123,
'amount': '$10',
'name': 'John Doe',
}
I think @Aurgho has answered your question. But I am going to put my general thoughts based on what Aurgho said, which might help others coming to this post with similar question.
There are multiple factors that can influence your design decision, along with the quotas and limits QLDB imposes. Here are few pointers that might help you think forward:
Query Pattern: At this point, Amazon QLDB allows creation of indexes only on the top level fields. In the nested document design(Option #1), if your queries are going to be on any of the fields of the nested document, then those queries won't use index and will perform scans. This can impact your performance. With Option #2, you can have indexes on both the tables and use those indexed fields in your join criteria.
Access pattern: Are you going to have significantly more writes than reads? If your reads are sparse and not extremely sensitive to a little elevated latency, Option #1 might be better from data modeling perspective, where are all the payment related information is captured in a single document. On the other hand, if you have a lot more reads and the reads are latency sensitive, you should evaluate your options from the previous point's perspective.
Quotas and Limits: Amazon QLDB has quotas on the document size (which is currently at 128 KB) https://docs.aws.amazon.com/qldb/latest/developerguide/limits.html#limits.fixed. If your plan to add more fields as you go, the per document size can keep increasing with the nested fields and you might eventually run into the document size limit. There are other quotas too which can impact your decision based on your use case.
Generally speaking, if you are not going to query on a field in the nested document and/or your writes >>> reads and/or your reads are not super sensitive to latency and/or your document size will stay within the currently imposed limits, you could do with Option #1. Having all your data in one document can ease you at the application layer when you are pushing the data into QLDB(just one insert) and when you have to process the documents in your code, but you will have to choose your trade-offs correctly.
These are just general pointers to help you think forward. You could have other use cases where either of the design options becomes more convincing than the other and you can trade-off certain advantages/disadvantages between the two.
Also, QLDB has some recommendations to optimize your query performance, which can further help you with your decision https://docs.aws.amazon.com/qldb/latest/developerguide/working.optimize.html