objective-cregexcore-datansexpression

Using NSExpression for column calculated field evaluation on CoreData. Variable substitution, formula storage


This is kind of a design question. Lets say that I have a CoreData model with 2 entities - Item and Formula.

Item has 3 numeric attributes "X", "Y" and "Z", and a to-one relation to a "Formula" entity.

Formula has one string attribute containing expressions like "(X*Y*Z)**(1.0/3)" or "Pi * X**3 / 3.0" etc. Any simple arithmetic using constant numbers, standard operators (addition, subtraction, multiplication, division, power, parentheses) and the "X" "Y" and "Z" symbols.

Now my task is very expected --- how to set up a new attribute to the "Item" entity, called "value" which will be calculated by plugging the X Y and Z values into the related "Formula", and evaluating the expression.

Considerations: 1. There may be millions of "Item" entities, and hundreds of "Formula". 2. I have control over the format in which Formula strings are created --- I can have people type in "$X+$Y" instead of "X+Y" if that eases things. 3. I will need to further calculate statistics on the "Value" attributed across subsets of items quickly (sums, medians stdDeves, averages etc.)

My questions: 1. generally how to go about it. Add a real numeric "Value" attribute to cache calculated results, or a calculated property that re-calculates when read? 2. How to use NSExpression to plug values instead of variable symbols like "X" "Y" "Z". 3. Can I somehow pre-create an NSExpression and cache it as another attribute of "Formula", and use it later instead of parsing and evaluating the formula for each item? How can one store a parsed NSExpression in CoreData?

I know this is a big question with many sub-questions. any hint will be appreciated!


Solution

  • Actual answers were sparse… so I ended up using an open source evaluator called DDMathParser from Dave Delong, that works much like NSExpression but is much simpler to use, and is extensible.

    In my model, I subclassed NSManagedObject for both my "Item" and my "Formula". I added each a calculated readonly property like the following:

    in "MyItem.m"

      // Define dependencies of the calculated value upon other attributes, for KVO. Whenever any of the provided keypaths change, there is a need to recalculate.
     + (NSSet *)keyPathsForValuesAffectingCalculatedValue {
         return [NSSet setWithObjects:@"x", @"y", @"z", @"formula.expression", nil];
     }
    - (double) calculatedValue {
            NSError *error = nil;
            NSDictionary *s = [NSDictionary dictionaryWithObjectsAndKeys: @(self.x) , @"X", @(self.y) , @"Y", @(self.z), @"Z", nil];
    
            NSNumber *result = [[DDMathEvaluator defaultMathEvaluator] evaluateExpression:self.formula.expression withSubstitutions:s error:&error];
            if (error)
                NSLog(@"Error calculating value: %@", error);
            else
                return [result doubleValue];
    }
    

    and in my "MyFormula.m":

    @dynamic expressionParsingError;
    
    + (NSSet *)keyPathsForValuesAffectingExpression {
        return [NSSet setWithObjects:@"formulaString", nil];
    }
    
    - (DDExpression *)expression {
        NSError *err = nil;
        DDExpression *exp = [DDExpression expressionFromString:self.volumeFormula error:&err];
        self.expressionParsingError = err;
        return err ? nil : exp;
    }
    

    I do not store or even cache parsed expressions and calculated results. I only cache one error object for displaying and reporting parsing errors of bad formulae. I receive this NSError from the DDMathParser engine upon converting it into a DDExpression.

    I could make these properties proper transient attributes of the model, but as the performance was good to start with, I saw no need to do it now. I'll probably re-iterate my solution sometime in the future.

    Then, In my tables and algorithms, I can simply bind my MacOS-X app table columns to the "calculatedValue" property, and it will get calculated automatically per need (although the result won't get cached).

    In the future, I can remove the need for DDMathParser, and move back to NSExpression. However --- DDMathParser lets me do wonderful things, like display immediately parsing errors as users edit their formulas. I also provide a little "sandbox" where users can test out their formulae with fake numbers and see that their formula works fine before applying it to millions of items.

    On my rather old MacBookPro (2009) analyzing 10000 items is kind'a immediate. Remember that cell-based tables don't evaluate the whole column - just the visible part.

    I hope this helps...