google-cloud-bigtablebigtable

Bigtable schema design views


I have a bigtable schema design that's proving a bit hard to scale in terms of reads. I would appreciate a quick look and some advice on bad design patterns.

Here's a quick overview:

screenshot of design concept

*please note that all row keys, column families and columns are using example names

My row key has 2 parts; one of those is a prefix, the other part is the actual name(e.g. resource/ABC/subresource/123). the 123 after subresource in this case is a reversed unix timestamp. We can have thousands of resources with millions of subresources under. The prefix before the name helps us group resources and subresources into "collections". That way all the resources live in the same place on the table and the subresources same thing.

On the column families and columns side, there isn't much going on, we only have a very limited number of column families(less than 10 in most cases). The column families act as different sources of data. The columns hold the different fields in the resource. The columns/fields are also usually very small, maybe 14 max

On the cells, we only keep the latest timestamp, and we only care about the latest cell. Each cell stores a very small marshalled proto message.That allows us to use column filters to only query fields we care about.

This set up started of very well but bigtable CPU utilization quickly got out of hand as we started scaling rapidly. We're doing a lot of concurrent batch reads, with each reading about 600 rows. Each of those rows are mostly distributed across resources so it wouldn't be reading from the same spot over and over again.I initially thought the column filters that we use had a lot to do with it but even doing away with those and doing the filtering ourselves before serving to the client still results in a huge amount of CPU utilization.

Any pointers/advice or observations of bad schema design would be highly appreciated.


Solution

  • It's hard to say exactly given the information, but everything seems to make sense with what you're saying.

    The first thing I would check is the key visualizer for your table to ensure you don't have any hotspots in your schema that could be causing the issue. The schema seems reasonably designed, but you never know if some areas might have an issue.

    Then, I might read up a bit on Bigtable performance. For scans, you should get around 220MB/s per node on SSD clusters. Perhaps there is more data being scanned than you realize and that is causing the high CPU load.

    For the high CPU utilization, what other metrics are you seeing? What is the number of rows read looking like? Read throughput? Read requests?