I would like to perform a quick filtered scan in the hbase shell to retrieve any records where a specific column value is null or empty. I was told I can do this in PIG as well but don't know how to do that either. Can someone please assist with either method to get this resolved. I will as last option do in java but just thought this would be faster?
**TableName, columnFamily, column**
test:table collected:url
Thanks.
EDIT:
Tried the following:
scan 'test:table',{ COLUMNS => 'collected:url', LIMIT => 10, FILTER => "ValueFilter( =, 'binaryprefix:<>' )"}
No results so maybe it works but does this look correct?
EDIT 2:
Error returned:
hbase(main):008:0> scan 'test:table', {FILTER =>
SingleColumnValueFilter.new(Bytes.toBytes('collected'),
Bytes.toBytes('url'), CompareFilter::CompareOp.valueOf('EQUAL'),
Bytes.toBytes(''), true, true), COLUMNS => 'collected:url' }
ArgumentError: wrong number of arguments (6 for 4)
You don't really need pig for solving your problem.
In Hbase shell SingleColumnValueFilter should help you. You need to set filterIfMissing
to false
, what is actually the default value. So your scan should look like:
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 'YourTable', {FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('')), COLUMNS => 'family:field' }