I wanted to know if it is possible to perform aggregation operations on values stored in multiple rows. For example, I have the following table
rowID colFam colQual value
00000 0000 A 12
00000 0001 B Test
00001 0000 A 35
00001 0001 B Foo
00002 0000 A 7
00002 0001 B Bar
What I am trying to do is find the average of all values stored with columnQualifier A. Is it possible using Accumulo's Iterators, Filters or Combiners?
I saw the StatsCombiner, but that combiner performs aggregation on different versions (rowID, colFam and colQual is the same but timestamp is different) of the same key instead of performing aggregation on distinct keys itself.
Combiners (and their predecessors, Aggregators), do aggregation for the same key. You can create an iterator which transforms multiple keys into a single key, but you'll still have to aggregate in the client, because you'll have a bunch of partial computations being produced for each tablet.
You could use Apache Fluo's "observers" to keep aggregate your stats while you ingest to your table.
There's probably multiple solutions. I would suggest taking a look at Apache Fluo, and if you really don't want to use that, then consider aggregating partial sums/counts as an iterator in each tablet, and doing the final aggregation on the client side.