[SOLVED] putting german text in hbase table

putting german text in hbase table

I am trying to update a table by adding a german string by doing the following: put'table:data_validation_test','58e1f4200f23e474ca2d7f3a','urlbody:data','Auslöser' What I get on scanning this table is this:

scan 'table:data_validation_test'
ROW                                  COLUMN+CELL                                                                                               
 58e1f4200f23e474ca2d7f3a            column=urlbody:data, timestamp=1491215905923, value=Ausl\xC3\xB6ser                                       
 58e1f4200f23e474ca2d7f3a            column=urlbody:id, timestamp=1491215697534, value=58e1f4200f23e474ca2d7f3a

I can't find a way to set encoding strings in hbase. How can I get the string as it is into Hbase?

Solution

This is just an output issue of the scan command (the same happens with get). In fact, your string is correctly stored.

This happens here because ö (\xC3\xB6) is encoded on 2 bytes, and \xC3 and \xB6 cannot be displayed as readable characters. Remember that in HBase, the main type is Array[Byte].

If you try to get your string value using JRuby (inside HBase shell) :

include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Get
import org.apache.hadoop.hbase.util.Bytes

config = HBaseConfiguration.create
htable = HTable.new(conf, 'table:data_validation_test')
result = htable.get(Get.new('58e1f4200f23e474ca2d7f3a'.to_java_bytes))

puts Bytes.toString(result.getValue('urlbody'.to_java_bytes, 'data'.to_java_bytes))

Then, your value should be displayed properly.