I am implementing a paged spliterator (in Java) which should allow parallel access.
I have the following test case (test is in Groovy with Spock):
def 'parallel, two pages'()
{
when: 'a sorted range from 0 to 6'
def fetcher = new IntegerRangePageFetcher(6)
and: 'a spliterator with a page size of 5'
def spliterator = new PagedSpliterator(fetcher, 5)
and: 'a stream with the given range is collected to a list'
def result = StreamSupport
.stream(spliterator, true)
.collect(Collectors.toList())
then: 'the sort order is obeyed'
expect result, contains(0, 1, 2, 3, 4, 5)
}
This testcase fails with the following error:
Condition not satisfied:
expect result, contains(0, 1, 2, 3, 4, 5)
| |
false [5, 0, 1, 2, 3, 4]
Expected: iterable containing [<0>, <1>, <2>, <3>, <4>, <5>]
but: item 0: was <5>
The spliterator has the characteristics()
return IMMUTABLE | ORDERED | SIZED | SUBSIZED | NONNULL;
The code works when I don't use parallel. So I don't understand ORDERED
:
trySplit
implementation and have to split following a given order? (currently I split in the mid of the open pages, 0-mid stays at the current spliterator, mid-end goes in the newly created spliterator)sort()
before collect()
because the framework doesn't guarantee any order at all?--- Update based on feedback ---
Thanks for your answers, I have two logical errors in my code. First the requested snippets:
@Override
public Spliterator<T> trySplit()
{
// first query
if (pageIterator == null) {
pageIterator = pageFetcher.fetchNextPage(paginationInfo);
}
// delegate split decision
var newPaginationInfo = paginationInfo.split();
if (newPaginationInfo == null) {
log.info("* Spliterator returns null");
return null;
}
// now we split
var newSpliterator = new PagedSpliterator<>(pageFetcher, newPaginationInfo);
return newSpliterator;
}
public PaginationInfo split()
{
// when open range or nothing left we don't split
if ((endElementIndex == -1) || !hasNextPage()) {
return null;
}
// calculate the splitting position
var firstHalfPages = (getEndPageIndex() - getNextPageIndex()) / 2;
var midElementIndex = (getNextPageIndex() + firstHalfPages) * pageSize;
// create an additional PaginationInfo and set the ranges according to the split position
var newPaginationInfo = new PaginationInfo(this);
newPaginationInfo.firstElementOnNextPageIndex = midElementIndex;
newPaginationInfo.nextElementIndex = midElementIndex;
endElementIndex = midElementIndex;
return newPaginationInfo;
}
First mistake:
the newly created Spliterator is set to the second half range instead of the first one. I read about the prefix in the documentation, but it feels very clumsy for me. I split at page size to have multiple parallel requests. At the beginning (the first spliterator instance) I have to fetch the first page to get the page and element counter. So to fix the order issue I have to handout the fetched data from the first spliterator to the second spliterator to obey the order, this feels very strange and not intuitive to me.
Second mistake:
// first query
if (pageIterator == null) {
pageIterator = pageFetcher.fetchNextPage(paginationInfo);
}
All subsequent created spliterators will receive a estimateSize()
and a trySplit()
call from the framework. During this calls currently I fetch a page but this would block parallelism, the fetch must occur later in the tryAdvance()
call.
I will implement this changes and then come back to you.
Yes, you have a bug in your trySplit. The documentation for Spliterator.trySplit specifies that the returned spliterator must contain a prefix of the elements, if you have the ORDERED characteristic. Switch around the returned Spliterator and the remaining contents of the split one.