What is the most efficient way to fetch the rowKeys only in a range in bigtable?
Context: I'm trying to delete a range of rows in bigtable. For that, there are two options:
The second approach necessitates a fetch, and this is where I'm a bit stumped. I want to fetch all the rowKeys, but no cell data at all. The suggested solutions in many places is this: "Add a modifying filter to strip the value"
From my experiments it seems like it does not remove the entire cell, just the value. Am I wrong here? Since my rows have almost a hundred of cells each, if I retrieve it even without the values I run out of memory quickly.
I found another solution using a limit of cellsPerRow to 1. This feels like a hack.
val query = Query.create(tableId)
query.filter(Filters.FILTERS.limit().cellsPerRow(1))
query.filter(Filters.FILTERS.value().strip) //suggested by others, but not enough
bigtableDataClient.readRows(query)
Is there a better solution?
For your goals of deleting several ranges of rows in your table, you have correctly evaluated the options.
DropRowRanges has a limit of 5000 calls per day per project, so if that is close to the order of magnitude you are working with, I would recommend to use that.
The other solution of querying the rows with a max of 1 cell per row and stripping the value is also viable and definitely not a hack. We recommend this technique if you're trying to count rows in Bigtable. Getting 1 cell per row will get you the row key information you're looking for without filling up your memory with the hundreds of cells you mentioned. And stripping the value is good to do in addition to that if the cell returned has a large amount of data in it and to generally reduce the amount of data going across the network.
I hope this clears up the solution and you can proceed with whichever will best suit your use case.