We are using HDP 3. We are trying to insert PDF files in one of the columns of a particular column family in Hbase table. Developing environment is python 3.6 and the hbase connector is happybase 1.1.0.
We are unable to upload any PDF file greater than 10 MB in hbase.
In hbase we have set the parameters as follows:
We get the following error:
IOError(message=b'org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.DoNotRetryIOException: Cell with size 80941994 exceeds limit of 10485760 bytes\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.checkCellSizeLimit(RSRpcServices.java:937)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1010)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2683)\n\tat org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)\n\tat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)\n\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)\n\tat
You have to check the hbase source code to see what is happening:
private void checkCellSizeLimit(final HRegion r, final Mutation m) throws IOException {
945 if (r.maxCellSize > 0) {
946 CellScanner cells = m.cellScanner();
947 while (cells.advance()) {
948 int size = PrivateCellUtil.estimatedSerializedSizeOf(cells.current());
949 if (size > r.maxCellSize) {
950 String msg = "Cell with size " + size + " exceeds limit of " + r.maxCellSize + " bytes";
951 if (LOG.isDebugEnabled()) {
952 LOG.debug(msg);
953 }
954 throw new DoNotRetryIOException(msg);
955 }
956 }
957 }
958 }
Based on the error message you are exceeding the r.maxCellSize
.
Note on above: The function PrivateCellUtil.estimatedSerializedSizeOf
is depreciated and will be removed in the future versions.
Here is its description:
Estimate based on keyvalue's serialization format in the RPC layer. Note that there is an extra SIZEOF_INT added to the size here that indicates the actual length of the cell for cases where cell's are serialized in a contiguous format (For eg in RPCs).
You have to check where is the value set. First check the "ordinary" values at HRegion.java
this.maxCellSize = conf.getLong(HBASE_MAX_CELL_SIZE_KEY, DEFAULT_MAX_CELL_SIZE);
So there is probably a HBASE_MAX_CELL_SIZE_KEY
and DEFAULT_MAX_CELL_SIZE
limit somewhere:
public static final String HBASE_MAX_CELL_SIZE_KEY = "hbase.server.keyvalue.maxsize";
public static final int DEFAULT_MAX_CELL_SIZE = 10485760;
Here you have your 10485760 limit which shows at your error message. If you need you can try raising this limit to your limit value. I recommend testing it properly before going live with it (the limit there has probably some reason behind it).
Edit: Adding information about how to change the value of base.server.keyvalue.maxsize
. Check the config.files
:
Where you can read:
hbase.client.keyvalue.maxsize (Description)
Specifies the combined maximum allowed size of a KeyValue instance. This is to set an upper boundary for a single entry saved in a storage file. Since they cannot be split it helps avoiding that a region cannot be split any further because the data is too large. It seems wise to set this to a fraction of the maximum region size. Setting it to zero or less disables the check. Default
10485760
hbase.server.keyvalue.maxsize (Description)
Maximum allowed size of an individual cell, inclusive of value and all key components. A value of 0 or less disables the check. The default value is 10MB. This is a safety setting to protect the server from OOM situations. Default
10485760