I want to write a custom UDF (UDAF/UDTF) that can take in a constant parameter.
For example, I want to write a function MAX(COL, i), where COL is the collection of values to find the max value, and i is the position (ie. i = 1, find the highest, i = 2, find the second highest, etc.), such that the Hive query looks like:
SELECT
MAX(value, 2)
FROM table;
This isn't just for MAX, so I need a general way of being able to do this, so sorting and selecting from the sorted collection will not work.
You can use ConstantObjectInspectors to get constant values passed as parameters. In your initialize() method for GenericUDF or init() in your GenericUDAFEvaluator, check to see if the specified ObjectInspector is an instance of ConstantObjectInspector. If it is cast it, otherwise throw an exception.
For example
public ObjectInspector init(Mode m, ObjectInspector[] parameters)
throws HiveException {
......
if(!( parameters[1] instanceof ConstantObjectInspector ) ) {
throw new HiveException("Position parameter must be constant.");
}
ConstantObjectInspector posOI = (ConstantObjectInspector) parameters[1];
pos = ((IntWritable) posOI.getWritableConstantValue()).get();
......
For your specific use-case here, check out collect_max
in Brickhouse (http://github.com/klout/brickhouse ) , which collects the top N key and max values.