javahbasecrudhbase-shell

Not able to delete a Row which has a timestamp Long.MAX_VALUE from Hbase


Somehow I ended up adding a row to Hbase with timestamp as Long.MAX_VALUE. Now Either not able to put with less than that Long.MAX_VALUE to the same key or delete that inserted row.

What cause this issue

Once I succesfully added this row (For some curiosity). Is it bad to add a timestamp value like this?

mm21000000000:422021000000000     column=s:fe:k, timestamp=9223370481975138807, value=m21000000000
mm21000000000:422021000000000     column=s:fe:m21000000000, timestamp=9223370481975138807, value=\x01

Now, If I try to add one more column to same row like this without timestamp or less than the timestamp as < Long.MAX_VALUE, it won't working. Of course If I add a new row with diff key it works.

put 'ue_combo','mm21000000000:422021000000000','s:le:k','3422021000000000' // I'll be adding this via JAVA
//Briefly
put.addColumn("s".getBytes(), "le:k".getBytes(), ts, "3422021000000000".getBytes()); //ts is timestamp

Finally decided to get rid of that row and tried deleting that row using key value and also used ROWPREFIXFILTER as shown below

deleteall 'table_name', 'mm21000000000:422021000000000'
deleteall 'table_name', {ROWPREFIXFILTER => 'mm'}

Not at all deleting

Solution via shell command or through Hbase APIs via JAVA codebase will help. Also able to figure the issue but not able to understand it thoroughly. Is Hbase understanding that the row added with timestamp as Long.MAX_VALUE will be added in future time? Hbase only allow in increasing ordered timestamp then meaning once a maxed timestamp is stored less than that timestamp cannot be inserted?

I found this mail-archive, didn't understood much but I think he tries to overriding the Hbase code. That's not possible in my case.

If you need anything extra, please mention in the comment


Solution

  • As far as I understand this HBase behaviour and the HBase concepts that lead to it, the main problem is the Long.MAX_VALUE is a special value, that will be changed to the current time on the server side.

    By design, Long.MAX_VALUE should not be used as a timestamp for any row, but HBase does not provide any protection to avoid it.

    If you discover HBase client code, e.g. Put, you can see using HConstants.LATEST_TIMESTAMP as the default timestamp for every put. Which means that the server should use the current timestamp.

      public Put(byte [] row) {
        this(row, HConstants.LATEST_TIMESTAMP);
      }
    

    Also, you can read this comment for LATEST_TIMESTAMP constant with a detailed description of deleting rows.

    The solution provided in the mailing-archive is based on manually rewriting HFile with the delete marker for the desired row. Also, it contains a gist with an example of code: https://www.mail-archive.com/user@hbase.apache.org/msg44615.html