javajava-8spliterator

Spliterator parallel sort order


I am implementing a paged spliterator (in Java) which should allow parallel access.

I have the following test case (test is in Groovy with Spock):

def 'parallel, two pages'()
{
    when: 'a sorted range from 0 to 6'
    def fetcher = new IntegerRangePageFetcher(6)

    and: 'a spliterator with a page size of 5'
    def spliterator = new PagedSpliterator(fetcher, 5)

    and: 'a stream with the given range is collected to a list'
    def result = StreamSupport
            .stream(spliterator, true)
            .collect(Collectors.toList())

    then: 'the sort order is obeyed'
    expect result, contains(0, 1, 2, 3, 4, 5)
}

This testcase fails with the following error:

Condition not satisfied:

expect result, contains(0, 1, 2, 3, 4, 5)
|      |
false  [5, 0, 1, 2, 3, 4]

Expected: iterable containing [<0>, <1>, <2>, <3>, <4>, <5>]
     but: item 0: was <5>

The spliterator has the characteristics()

return IMMUTABLE | ORDERED | SIZED | SUBSIZED | NONNULL;

The code works when I don't use parallel. So I don't understand ORDERED:

--- Update based on feedback ---

Thanks for your answers, I have two logical errors in my code. First the requested snippets:

@Override
public Spliterator<T> trySplit()
{
    // first query
    if (pageIterator == null) {
        pageIterator = pageFetcher.fetchNextPage(paginationInfo);
    }

    // delegate split decision
    var newPaginationInfo = paginationInfo.split();
    if (newPaginationInfo == null) {
        log.info("* Spliterator returns null");
        return null;
    }

    // now we split
    var newSpliterator = new PagedSpliterator<>(pageFetcher, newPaginationInfo);
    return newSpliterator;
}

public PaginationInfo split()
{
    // when open range or nothing left we don't split
    if ((endElementIndex == -1) || !hasNextPage()) {
        return null;
    }

    // calculate the splitting position
    var firstHalfPages = (getEndPageIndex() - getNextPageIndex()) / 2;
    var midElementIndex = (getNextPageIndex() + firstHalfPages) * pageSize;

    // create an additional PaginationInfo and set the ranges according to the split position
    var newPaginationInfo = new PaginationInfo(this);
    newPaginationInfo.firstElementOnNextPageIndex = midElementIndex;
    newPaginationInfo.nextElementIndex = midElementIndex;

    endElementIndex = midElementIndex;

    return newPaginationInfo;
}

First mistake:

the newly created Spliterator is set to the second half range instead of the first one. I read about the prefix in the documentation, but it feels very clumsy for me. I split at page size to have multiple parallel requests. At the beginning (the first spliterator instance) I have to fetch the first page to get the page and element counter. So to fix the order issue I have to handout the fetched data from the first spliterator to the second spliterator to obey the order, this feels very strange and not intuitive to me.

Second mistake:

    // first query
    if (pageIterator == null) {
        pageIterator = pageFetcher.fetchNextPage(paginationInfo);
    }

All subsequent created spliterators will receive a estimateSize() and a trySplit() call from the framework. During this calls currently I fetch a page but this would block parallelism, the fetch must occur later in the tryAdvance() call.

I will implement this changes and then come back to you.


Solution

  • Yes, you have a bug in your trySplit. The documentation for Spliterator.trySplit specifies that the returned spliterator must contain a prefix of the elements, if you have the ORDERED characteristic. Switch around the returned Spliterator and the remaining contents of the split one.