google-cloud-platformgoogle-cloud-bigtablebigtable

Deciding whether to use column versions or new rows in GCP Bigtable


Let’s say I want to store a snapshot of a User’s Settings every time they make a change. These snapshots would allow the User to see a history of the changes they made to their settings.

Would it be better to write to a new row for every change (Method A) or to write to the same row for every change (Method B)? Am I missing any pros/cons to either method?

The following examples show a User(userId123) making the following changes:

  1. Turn notifications ON
  2. Turn notifications OFF
  3. Turn notifications ON

Method A:

The key would be the userId + timestamp so it writes to a new row every time.
Queries would scan by row prefix (userId) and get multiple rows to show history

key column data column version
userId123:20220401 { notifications: true } 1
userId123:20220402 { notifications: false } 1
userId123:20220403 { notifications: true } 1

pros:

cons:


Method B:

The key would just be the userId and would write to the same row every time creating a new column version.
Queries would lookup by row key (userId) and get a single row with multiple column versions for history

key column data column version
userId123 { notifications: true } 1
{ notifications: false } 2
{ notifications: true } 3

pros:

cons:


Solution

  • This scenario would lend itself better to storing the changes in one row with versions. You could even set up garbage collection rules to delete any changes that are more than one year old as you mentioned.

    One thing to help optimize this method further is if you only need the latest data or latest 5 changes in some scenarios you could perform a query that would filter to give only 5 versions and safe on network costs.