hadoophivesql-delete

How to delete and update a record in Hive


I have installed Hadoop, Hive, Hive JDBC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or update command of MySQL is not working in Hive.

hive> delete from student where id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

Solution

  • You should not think about Hive as a regular RDBMS, Hive is better suited for batch processing over very large sets of immutable data.

    The following applies to versions prior to Hive 0.14, see the answer by ashtonium for later versions.

    There is no operation supported for deletion or update of a particular record or particular set of records, and to me this is more a sign of a poor schema.

    Here is what you can find in the official documentation:

    Hadoop is a batch processing system and Hadoop jobs tend to have high latency and
    incur substantial overheads in job submission and scheduling. As a result -
    latency for Hive queries is generally very high (minutes) even when data sets
    involved are very small (say a few hundred megabytes). As a result it cannot be
    compared with systems such as Oracle where analyses are conducted on a
    significantly smaller amount of data but the analyses proceed much more
    iteratively with the response times between iterations being less than a few
    minutes. Hive aims to provide acceptable (but not optimal) latency for
    interactive data browsing, queries over small data sets or test queries.
    
    Hive is not designed for online transaction processing and does not offer
    real-time queries and row level updates. It is best used for batch jobs over
    large sets of immutable data (like web logs).
    

    A way to work around this limitation is to use partitions: I don't know what you id corresponds to, but if you're getting different batches of ids separately, you could redesign your table so that it is partitioned by id, and then you would be able to easily drop partitions for the ids you want to get rid of.