In my solr database I have a structure that looks like this: A parent document representing names of people (dictionary). These parent documents also contain nested child documents where the documents that match these people's names appear (nested list of dictionaries).
When I try to cluster the information in a way that makes sense, I am only able to cluster directly the child documents, which results in a bunch of clustered keywords that belong to those texts.
Ideally, I would like to cluster people (parent documents) in terms of the similarity of their nested child documents. SO rather than having key words from texts clustered together, I would like to cluster people's names that have similar content.
E.g. if Bob, John, Lewis profiles all have child documents that contain the text "We are highly skilled in Python"; and Dan, Maria, Chris profiles have child documents that contain the text "We are highly skilled in Java". I would like a cluster of (Bob, John, Lewis) and a cluster of (Dan, Maria, Chris). So, when we click on the first cluster, we get the result "We are highly skilled in Python", and for the second cluster, we get the result "we are highly skilled in Java".
Is there a way of reproducing such a structure on carrot workbench?
Unfortunately not. This is a pretty specific scenario and we aim to keep Workbench a generic tool with Solr being one of many document sources.
For this kind of parent-child clustering, you'd need to directly use Carrot2 Java or REST API:
As a result of the above procedure, you'll have a set of clusters containing parent documents clustered by the textual content of the documents' child documents.