databasefileapache-kafkabgp

Best way to store stream of small binary files (BGP updates)


This question may look like this. I am trying to gather ideas on how to implement a BGP pipeline.

I am receiving 100-1000 messages (BGP updates) per second, a few kilobytes per update, over Kafka.

I need to archive them in a binary format with some metadata for fast lookup: I am building periodically a "state" of the BGP table which will merge all the updates received over a certain time. Thus the need of a database.

What I was doing until now: group them in "5 minute" files (messages end-to-end) as it is common thing for BGP collection tools and add the link in a database. I realize some disadvantages: complicated (having to group by key, manage Kafka offset commit), no fine selection where to start/end.

What I am thinking: using a database (Clickhouse/Google BigTable/Amazon Redshift) and insert every single entry with the metadata and a link to the unique update stored on S3/Google Cloud storage/local file.

I am worried of the download performances (most likely over HTTP) since compiling all the updates into a state may take a few thousands of those messages. Do you have experience of batch downloading this? I do not think storing the updates directly in the database would be optimal too.

Any opinion, ideas, suggestions? Thank you


Solution

  • Cloud Bigtable is capable of 10,000 requests per second per "node", and costs $0.65 per node per hour. The smallest production cluster is 3 nodes for a total of 30,000 rows per second. Your application calls for a maximum of 1,000 requests per second. While Cloud Bigtable can handle your workload, I would suggest that you consider Firestore.

    At a couple of K per message, I would also consider putting the entire value in the database rather than just the metadata for ease of use.