pythonamazon-web-servicesamazon-timestream

How to populate an AWS Timestream DB?


I am trying to use AWS Timestream to store data with timesteamp (in python using boto3).

The data I need to store corresponds to prices over time of different tokens. Each record has 3 field: token_address, timestamp, price. I have around 100 M records (with timestamps from 2019 to now).

I have all the data in a CSV and I would like to populate the DB with it. But I don't find a way to do this in the documentation as I am limited by 100 writes per query according to quotas. The only optimization proposed in documentation is Writing batches of records with common attributes but in my my case they don't share the same values (they all have the same structure but not the same values so I can not define a common_attributes as they do in the example).

So is there a way to populate a Timestream DB without writing records by batch of 100 ?


Solution

  • I asked AWS support, here is the answer:

    Unfortunately, "Records per WriteRecords API request" is a non-configurable limit. This limitation is already noted by the development team.

    However, to get any additional insights to help with your load, I have reached out to my internal team. I will get back to you as soon as I have an update from the team.

    EDIT:

    I had a new answer from AWS support:

    Team, suggested that a new feature called batch load is being released tentatively at the end of February (2023). This feature will allow the customer to ingest data from CSV files directly into Timestream in bulk.