javajava-streamcollectorsaccumulate

Java implement accumulator class that provides a Collector


A Collector has three generic types:

public interface Collector<T, A, R>

With A being the mutable accumulation type of the reduction operation (often hidden as an implementation detail).

If I want to create my custom collector, I need to create two classes:

Is there any library function/trick that takes the accumulation type and provides a corresponding Collector?

Simple example

This example is extra simple to illustrate the question, I know I could use reduce for this case, but this is not what I am looking for. Here is a more complex example that sharing here would make the question too long, but it is the same idea.

Let's say I want to collect the sum of a stream and return it as a String.

I can implement my accumulator class:

public static class SumCollector {
   Integer value;

    public SumCollector(Integer value) {
        this.value = value;
    }

    public static SumCollector supply() {
        return new SumCollector(0);
    }

    public void accumulate(Integer next) {
       value += next;
    }

    public SumCollector combine(SumCollector other) {
       return new SumCollector(value + other.value);
    }

    public String finish(){
        return Integer.toString(value);
    }
}

And then I can create a Collector from this class:

Collector.of(SumCollector::supply, SumCollector::accumulate, SumCollector::combine, SumCollector::finish);

But it seems strange to me that they all refer to the the other class, I feel that there is a more direct way to do this.

What I could do to keep only one class would be implements Collector<Integer, SumCollector, String> but then every function would be duplicated (supplier() would return SumCollector::supply, etc).


Solution

  • I want to focus the wording of one point of your question, because I feel like it could be the crux of the underlying confusion.

    If I want to create my custom collector, I need to create two classes:

    one for the custom accumulation type one for the custom collector itself

    No, you need to create only one class, that of your custom accumulator. You should use the appropriate factory method to instantiate your custom Collector, as you demonstrate yourself in the question.

    Perhaps you meant to say that you need to create two instances. And that is also incorrect; you need to create a Collector instance, but to support the general case, many instances of the accumulator can be created (e.g., groupingBy()). Thus, you can't simply instantiate the accumulator yourself, you need to provide its Supplier to the Collector, and delegate to the Collector the ability to instantiate as many instances as required.

    Now, think about the overloaded Collectors.of() method you feel is missing, the "more direct way to do this." Clearly, such a method would still require a Supplier, one that would create instances of your custom accumulator. But Stream.collect() needs to interact with your custom accumulator instances, to perform accumulate and combine operations. So the Supplier would have to instantiate something like this Accumulator interface:

    public interface Accumulator<T, A extends Accumulator<T, A, R>, R> {
    
        /**
         * @param t a value to be folded into this mutable result container
         */
        void accumulate(T t);
    
        /**
         * @param that another partial result to be merged with this container
         * @return the combined results, which may be {@code this}, {@code that}, or a new container
         */
        A combine(A that);
    
        /**
         * @return the final result of transforming this intermediate accumulator
         */
        R finish();
    
    }
    

    With that, it's then straightforward to create Collector instances from an Supplier<Accumulator>:

        static <T, A extends Accumulator<T, A, R>, R> 
        Collector<T, ?, R> of(Supplier<A> supplier, Collector.Characteristics ... characteristics) {
            return Collector.of(supplier, 
                                Accumulator::accumulate, 
                                Accumulator::combine, 
                                Accumulator::finish, 
                                characteristics);
        }
    

    Then, you'd be able to define your custom Accumulator:

    final class Sum implements Accumulator<Integer, Sum, String> {
    
        private int value;
    
        @Override
        public void accumulate(Integer next) {
            value += next;
        }
    
        @Override
        public Sum combine(Sum that) {
            value += that.value;
            return this;
        }
    
        @Override
        public String finish(){
            return Integer.toString(value);
        }
    
    }
    

    And use it:

    String sum = ints.stream().collect(Accumulator.of(Sum::new, Collector.Characteristics.UNORDERED));
    

    Now… it works, and there's nothing too horrible about it, but is all the Accumulator<A extends Accumulator<A>> mumbo-jumbo "more direct" than this?

    final class Sum {
    
        private int value;
    
        private void accumulate(Integer next) {
            value += next;
        }
    
        private Sum combine(Sum that) {
            value += that.value;
            return this;
        }
    
        @Override
        public String toString() {
            return Integer.toString(value);
        }
    
        static Collector<Integer, ?, String> collector() {
            return Collector.of(Sum::new, Sum::accumulate, Sum::combine, Sum::toString, Collector.Characteristics.UNORDERED);
        }
    
    }
    

    And really, why have an Accumulator dedicated to collecting to a String? Wouldn't reduction to a custom type be more interesting? Something that along the lines of IntSummaryStatistics that has other useful methods like average() alongside toString()? This approach is a lot more powerful, requires only one (mutable) class (the result type) and can encapsulate all of its mutators as private methods rather than implementing a public interface.

    So, you're welcome to use something like Accumulator, but it doesn't really fill a real gap in the core Collector repertoire.