pythonhbasehappybase

HappyBase and Atomic Batch Inserts for HBase


With the HappyBase API for HBase in Python, a batch insert can be performed by the following:

import happybase
connection = happybase.Connection()
table = connection.table('table-name')
batch = table.batch()
# put several rows to this batch via batch.put()
batch.send()

What would happen in the event this batch failed half way through? Would the rows that had been saved remain saved and those that didn't not be saved?

I noted in the HappyBase github that the table.batch() method takes transaction and wal as parameters. Could these be configured in such a way as to rollback the successfully saved rows in the event the batch fails halfway through?

Will happybase throw an exception here, which would permit me to take note of the row keys and perform a batch delete?


Solution

  • Did you follow the tutorial about batch mutations in the Happybase docs? It looks like you're mixing up a few things here. https://happybase.readthedocs.org/en/latest/user.html#performing-batch-mutations

    Batches are purely a performance optimization: they avoid round-tripping to the Thrift server for each row that is stored/deleted, which may result in a significant speedup.

    The context manager behaviour (the with block), as explained with numerous examples in the user guide linked above, is a purely client-side convenience API that makes application code easier to write and reason about. If the with block completes successfully all mutations are sent to the server in one go.

    However... that's only the happy path. What to do in case some Python exception was raised somewhere from the with block? That's where the transaction flag comes into play: if True, no data is sent at all to the server, if False, any pending data is flushed anyway. Which behaviour is preferred strongly depends on your use case.