ioscore-datansexpression

Is it possible to use a group by count in the havingPredicate for a CoreData fetch (for dupe detection)?


For reference, the problem I'm trying to solve is efficiently finding and removing duplicates in a table that could have a lot of entries.

The table I am working with is called PersistedDay with a dayString object in it (it's a string. :-P). There are more columns that aren't relevant to this question. I'd like to find any PersistedDay's that have duplicates.

In SQL, this is one of the efficient ways you can do that (FYI, I can do this query on the CoreData backing SQLite DB):

SELECT ZDAYSTRING FROM ZPERSISTEDDAY GROUP BY ZDAYSTRING HAVING COUNT(ZDAYSTRING) > 1;

This returns ONLY the dayStrings that have duplicates and you can then get all of the fields for those objects by querying using the resulting day strings (you can use it as a sub query to do it all in one request).

NSFetchRequest seems to have all of the required pieces to do this too, but it doesn't quite seem to work. Here's what I tried to do:

NSManagedObjectContext *context = [self managedObjectContext];

NSFetchRequest *request = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:@"PersistedDay" inManagedObjectContext:context];
[request setEntity:entity];

NSPropertyDescription* dayStringProperty = entity.propertiesByName[@"dayString"];

request.propertiesToFetch = @[dayStringProperty];
request.propertiesToGroupBy = @[dayStringProperty];
request.havingPredicate = [NSPredicate predicateWithFormat: @"dayString.@count > 1"];
request.resultType = NSDictionaryResultType;

NSArray *results = [context executeFetchRequest:request error:NULL];

That doesn't work. :-P If I try that I get an error "Unsupported function expression count:(dayString)" when trying to do the fetch. I don't think the dayString in "dayString.@count" even matters in that code above...but, I put it in for clarity (SQL count just operates on the grouped rows).

So, my question is: is this possible and, if so, what is the syntax to do it? I couldn't find anything in the CoreData docs to indicate how to do this.

I found one similar SO posts that I now unfortunately can't find again that was about running a count in a having clause (I don't think there was a group by). But, the poster gave up and did it a different way after not finding a solution. I'm hoping this is more explicit so maybe someone has an answer. :)

For reference, this is what I am doing for now that DOES work, but requires returning almost all the rows since there are very few duplicates in most cases:

NSManagedObjectContext *context = [self managedObjectContext];

NSFetchRequest *request = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:@"PersistedDay"
                                          inManagedObjectContext:context];
[request setEntity:entity];

NSPropertyDescription* dayStringProperty = entity.propertiesByName[@"dayString"];

// Get the count of dayString...
NSExpression *keyPathExpression = [NSExpression expressionForKeyPath: @"dayString"]; // Does not really matter
NSExpression *countExpression = [NSExpression expressionForFunction: @"count:" arguments: [NSArray arrayWithObject:keyPathExpression]];
NSExpressionDescription *expressionDescription = [[NSExpressionDescription alloc] init];
[expressionDescription setName: @"dayStringCount"];
[expressionDescription setExpression: countExpression];
[expressionDescription setExpressionResultType: NSInteger32AttributeType];

request.propertiesToFetch = @[dayStringProperty, expressionDescription];
request.propertiesToGroupBy = @[dayStringProperty];
request.resultType = NSDictionaryResultType;

NSArray *results = [context executeFetchRequest:request error:NULL];

I then have to loop over the result and only return the results that have dayStringCount > 1. Which is what the having clause should do. :-P

NOTE: I know CoreData isn't SQL. :) Just would like to know if I can do the equivalent type of operation with the same efficiency as SQL.


Solution

  • Yes it is possible. You cannot reference count as key path, however you can reference it as variable. Just like in SQL. In my example I have cities created with duplicate names.

    let fetchRequest = NSFetchRequest(entityName: "City")
    
    let nameExpr = NSExpression(forKeyPath: "name")
    let countExpr = NSExpressionDescription()
    let countVariableExpr = NSExpression(forVariable: "count")
    
    countExpr.name = "count"
    countExpr.expression = NSExpression(forFunction: "count:", arguments: [ nameExpr ])
    countExpr.expressionResultType = .Integer64AttributeType
    
    fetchRequest.resultType = .DictionaryResultType
    fetchRequest.sortDescriptors = [ NSSortDescriptor(key: "name", ascending: true) ]
    fetchRequest.propertiesToGroupBy = [ cityEntity.propertiesByName["name"]! ]
    fetchRequest.propertiesToFetch = [ cityEntity.propertiesByName["name"]!, countExpr ]
    
    // filter out group result and return only groups that have duplicates
    fetchRequest.havingPredicate = NSPredicate(format: "%@ > 1", countVariableExpr)
    

    Complete playground file at: https://gist.github.com/pronebird/cca9777af004e9c91f9cd36c23cc821c