I am thinking of doing a mapreduce using accumulo tables as input.
Is there a way to have 2 different tables as input, the same way it exists for the multiple files input like addInputPath
?
Or is it possible to have one input from a file and the other one from a table with AccumuloInputFormat
?
You probably want to take a look at AccumuloMultiTableInputFormat
. The Accumulo manual demonstrates how to use it here.
Example Usage:
job.setInputFormat(AccumuloInputFormat.class);
AccumuloMultiTableInputFormat.setConnectorInfo(job, user, new PasswordToken(pass));
AccumuloMultiTableInputFormat.setMockInstance(job, INSTANCE_NAME);
InputTableConfig tableConfig1 = new InputTableConfig();
InputTableConfig tableConfig2 = new InputTableConfig();
Map<String, InputTableConfig> configMap = new HashMap<String, InputTableConfig>();
configMap.put(table1, tableConfig1);
configMap.put(table2, tableConfig2);
AccumuloMultiTableInputFormat.setInputTableConfigs(job, configMap);
See the unit test for AccumuloMultiTableInputFormat here for some additional information.
Note, that unlike normal multiple inputs, you can't specify different Mappers to run on each table. Although, its not a massive problem in this case since the incoming Key/Value types are the same and you can use:
RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
String tableName = split.getTableName();
To workout which table the records are coming from (taken from the Accumulo manual) in your mapper.