filterhbasedatabase-scan

HBase Scan Filter - Skip rows without required columns


I'm trying to put a filter on my HBase Scan object that skips rows that do not have the necessary columns filled in. I figure I should use a skip filter first, but then I get stumped. I don't see in the package summary anything about whether a column is present or not.

Should I use a column value filter, and check to see if the columns in question null or blank? And why do filters return columns (such as ColumnCountGetFilter)? Is there a guide or something someone could point me towards to learn more about Filters that isn't just a collection of javadocs?


Solution

  • You can look at the source codes of the filter package.

    e.g. The source code of ColumnCountGetFilter is quite short, if you look at the following codes,

    @Override
    public boolean filterAllRemaining() {
      return this.count > this.limit;
    }
    
    @Override
    public ReturnCode filterKeyValue(KeyValue v) {
      this.count++;
      return filterAllRemaining() ? ReturnCode.SKIP: ReturnCode.INCLUDE;
    }
    

    You should understand that the filter implementation returns ReturnCode.SKIP or ReturnCode.INCLUDE, they does't return colmns directly. They return the flags to tell whether should return the KeyValues to the client side.

    You may need to implement custom filters, the HBase filter package contains good samples. You can go through them and write your own.