javalambdajava-8java-stream

How can I convert a Stream of Strings to Stream of String pairs?


I want to take a stream of strings and turn it into a stream of word pairs. eg:

I have: { "A", "Apple", "B", "Banana", "C", "Carrot" }

I want: { ("A", "Apple"), ("Apple", "B"), ("B", "Banana"), ("Banana", "C") }.

This is nearly the same as Zipping, as outlined at Zipping streams using JDK8 with lambda (java.util.stream.Streams.zip)

However, that produces: { (A, Apple), (B, Banana), (C, Carrot) }

The following code works, but is clearly the wrong way to do it (not thread safe etc etc):

static String buffered = null;

static void output(String s) {
    String result = null;
    if (buffered != null) {
        result = buffered + "," + s;
    } else {
        result = null;
    }

    buffered = s;
    System.out.println(result);
}

// ***** 

Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
testing.forEach(s -> {output(s);});

Solution

  • If you:

    1. Don't like the idea of creating a list with all strings from your stream
    2. Don't want to use external libraries
    3. Like to get your hands dirty

    Then you can create a method to group elements from a stream using Java 8 low-level stream builders StreamSupport and Spliterator:

    class StreamUtils {
        public static<T> Stream<List<T>> sliding(int size, Stream<T> stream) {
            return sliding(size, 1, stream);
        }
    
        public static<T> Stream<List<T>> sliding(int size, int step, Stream<T> stream) {
            Spliterator<T> spliterator = stream.spliterator();
            long estimateSize;
    
            if (!spliterator.hasCharacteristics(Spliterator.SIZED)) {
                estimateSize = Long.MAX_VALUE;
            } else if (size > spliterator.estimateSize()) {
                estimateSize = 0;
            } else {
                estimateSize = (spliterator.estimateSize() - size) / step + 1;
            }
    
            return StreamSupport.stream(
                    new Spliterators.AbstractSpliterator<List<T>>(estimateSize, spliterator.characteristics()) {
                        List<T> buffer = new ArrayList<>(size);
    
                        @Override
                        public boolean tryAdvance(Consumer<? super List<T>> consumer) {
                            while (buffer.size() < size && spliterator.tryAdvance(buffer::add)) {
                                // Nothing to do
                            }
    
                            if (buffer.size() == size) {
                                List<T> keep = new ArrayList<>(buffer.subList(step, size));
                                consumer.accept(buffer);
                                buffer = keep;
                                return true;
                            }
                            return false;
                        }
                    }, stream.isParallel());
        }
    }
    

    Methods and parameters naming was inspired in their Scala counterparts.

    Let's test it:

    Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
    System.out.println(StreamUtils.sliding(2, testing).collect(Collectors.toList()));
    

    [[A, Apple], [Apple, B], [B, Banana], [Banana, C], [C, Carrot]]

    What about not repeating elements:

    Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
    System.out.println(StreamUtils.sliding(2, 2, testing).collect(Collectors.toList()));
    

    [[A, Apple], [B, Banana], [C, Carrot]]

    And now with an infinite Stream:

    StreamUtils.sliding(5, Stream.iterate(0, n -> n + 1))
            .limit(5)
            .forEach(System.out::println);
    

    [0, 1, 2, 3, 4]
    [1, 2, 3, 4, 5]
    [2, 3, 4, 5, 6]
    [3, 4, 5, 6, 7]
    [4, 5, 6, 7, 8]