java alfresco cmis opencmis cmis-workbench

OpenCmis query slower than apache cmis workbench

I' dooing pretty simple query

SELECT cmis:objectId, cmis:name, cmis:parentId
FROM cmis:folder
ORDER BY cmis:name

Running this query with apache cmis workbench take ~ 15 sec Running the same query with opencmis, is pretty quick, but going throught the result is terribly slow ~ 3 min.

session.query( queryStmt, false).iterator().toList()

By spliting the calls like this

def rs = session.query( queryStmt, false)
def iterator = rs.iterator()
def folders = iterator.toList()

I was able to determine that the toList() is where it's slow. But i don't get why.

I also tried to define a operationContext and use it with the query. Same results. Here my operationContext

def filter = "cmis:objectId,cmis:name,cmis:parentId"
def context = session.createOperationContext()
context.setCacheEnabled(false)
context.setFilterString(filter)
context.setRenditionFilterString(filter)

Any idea on how to perform this query faster ?

Solution

The CMIS Workbench only fetches the first 100 hits by default. Depending on the repository, that's usually fast. Increase "max hits" to get more.

To replicate what the CMIS Workbench is doing, try this code snippet:

String queryStmt = "SELECT cmis:objectId, cmis:name, cmis:parentId FROM cmis:folder ORDER BY cmis:name";
int maxHits = 100;

OperationContext context = session.createOperationContext();
context.setMaxItemsPerPage(maxHits);

session.query(queryStmt, false, context).getPage(maxHits).iterator().toList();

toList iterates over all results of the query. The default OperationContext defines batches of 100 hits. That is, under the hood OpenCMIS will do several (possibly many) query calls to the repository, asking for the first 100 hits, the second 100 hits, the third 100 hits, ...

If you have a million hits in total, you end up with quite a few back-end calls.

Try increasing the batch size with context.setMaxItemsPerPage(1000000). This is usually faster if you have many hits and want them all as a list.

Small batches are better when you consume them in a loop and don't need them all at once. It also allows handling of result sets that don't fit into the clients memory.

And another aspect: Get rid of the ORDER BY and sort the list later in Java. You have the result set in memory anyway. If the repository has no index on cmis:name, it can slow down the query processing on the server side.