hivehiveqlatomicacid

What happens when a hive insert is failed halfway?


Suppose an insert is expected to load 100 records in hive and 40 records have been inserted and the insert failed for some reason. will the transaction roll back completely, undoing 40 records which were inserted? or Will we see 40 records in the hive table even after the insert query failed?


Solution

  • The operation is atomic (even for non-ACID table): If you inserting or rewriting data using HiveQL, it writes data into temporary location and only if the command succeeds files are moved to the table location (old files are deleted in case of INSERT OVERWRITE). If SQL statement fails the data remains as it was before statement execution.

    Note about S3 direct writes: Direct writes to S3 feature should be disabled to allow Hive to write to temporary location and rewrite target folder only if operation succeeded:

    -- Disable AWS S3 direct writes:
    set hive.allow.move.on.s3=true; 
    

    Read also this documentation for more details on which ACID features supported in concurrency mode and limitations: What is ACID and why should you use it?

    Up until Hive 0.13, atomicity, consistency, and durability were provided at the partition level. Isolation could be provided by turning on one of the available locking mechanisms (ZooKeeper or in memory). With the addition of transactions in Hive 0.13 it is now possible to provide full ACID semantics at the row level, so that one application can add rows while another reads from the same partition without interfering with each other.

    Also read this about Hive locks with ACID enabled (transactional and non-transactional tables)

    Update: Since DEC 2020 Amazon S3 is strongly consistent at no extra charge. So, removed part about S3 eventual consistency.