timestamphbasehbase-shell

Delete all data from HBase table according to time range?


I am trying to delete all data from HBase table, which has a timestamp older than a specified timestamp. This contains all the column families and rows.

Is there a way this can be done using shell as well as Java API?


Solution

  • HBase has no concept of range delete markers. This means that if you need to delete multiple cells, you need to place delete marker for every cell, which means you'll have to scan each row, either on the client side or server side. This means that you have two options:

    1. BulkDeleteProtocol : This uses a coprocessor endpoint, which means that the complete operation will run on the server side. The link has an example of how to use it. If you do a web search, you can easily find how to enable a coprocessor endpoint in HBase.
    2. Scan and delete: This is a clean and the easiest option. Since you said that you need to delete all column families older than a particular timestamp, the scan and delete operation can be optimized greatly by using server side filtering to read only the first key of each row.

      Scan scan = new Scan();
      scan.setTimeRange(0, STOP_TS);  // STOP_TS: The timestamp in question
      // Crucial optimization: Make sure you process multiple rows together
      scan.setCaching(1000);
      // Crucial optimization: Retrieve only row keys
      FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
          new FirstKeyOnlyFilter(), new KeyOnlyFilter());
      scan.setFilter(filters);
      ResultScanner scanner = table.getScanner(scan);
      List<Delete> deletes = new ArrayList<>(1000);
      Result [] rr;
      do {
        // We set caching to 1000 above
        // make full use of it and get next 1000 rows in one go
        rr = scanner.next(1000);
        if (rr.length > 0) {
          for (Result r: rr) {
            Delete delete = new Delete(r.getRow(), STOP_TS);
            deletes.add(delete);
          }
          table.delete(deletes);
          deletes.clear();
        }
      } while(rr.length > 0);