I am trying to update a table by adding a german string by doing the following:
put'table:data_validation_test','58e1f4200f23e474ca2d7f3a','urlbody:data','Auslöser'
What I get on scanning this table is this:
scan 'table:data_validation_test'
ROW COLUMN+CELL
58e1f4200f23e474ca2d7f3a column=urlbody:data, timestamp=1491215905923, value=Ausl\xC3\xB6ser
58e1f4200f23e474ca2d7f3a column=urlbody:id, timestamp=1491215697534, value=58e1f4200f23e474ca2d7f3a
I can't find a way to set encoding strings in hbase. How can I get the string as it is into Hbase?
This is just an output issue of the scan
command (the same happens with get
). In fact, your string is correctly stored.
This happens here because ö (\xC3\xB6
) is encoded on 2 bytes, and \xC3
and \xB6
cannot be displayed as readable characters. Remember that in HBase, the main type is Array[Byte]
.
If you try to get your string value using JRuby (inside HBase shell) :
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Get
import org.apache.hadoop.hbase.util.Bytes
config = HBaseConfiguration.create
htable = HTable.new(conf, 'table:data_validation_test')
result = htable.get(Get.new('58e1f4200f23e474ca2d7f3a'.to_java_bytes))
puts Bytes.toString(result.getValue('urlbody'.to_java_bytes, 'data'.to_java_bytes))
Then, your value should be displayed properly.