rdata.tablepivot-tablecalculated-columnsrpivottable

Addition of calculated field in rpivotTable


I want to create a calculated field to use with the rpivotTable package, similar to the functionality seen in excel.

For instance, consider the following table:

+--------------+--------+---------+-------------+-----------------+
| Manufacturer | Vendor | Shipper | Total Units | Defective Units |
+--------------+--------+---------+-------------+-----------------+
| A            | P      | X       |      173247 |           34649 |
| A            | P      | Y       |      451598 |          225799 |
| A            | P      | Z       |      759695 |          463414 |
| A            | Q      | X       |      358040 |          225565 |
| A            | Q      | Y       |      102068 |           36744 |
| A            | Q      | Z       |      994961 |          228841 |
| A            | R      | X       |      454672 |          231883 |
| A            | R      | Y       |      275994 |          124197 |
| A            | R      | Z       |      691100 |          165864 |
| B            | P      | X       |      755594 |          302238 |
| .            | .      | .       |           . |               . |
| .            | .      | .       |           . |               . |
+--------------+--------+---------+-------------+-----------------+

(my actual table has many more columns, both dimensions and measures, time, etc. and I need to define multiple such "calculated columns")

If I want to calculate defect rate (which would be Defective Units/Total Units) and I want to aggregate by either of the first three columns, I'm not able to.

I tried assignment by reference (:=), but that still didn't seem to work and summed up defect rates (i.e., sum(Defective_Units/Total_Units)), instead of sum(Defective_Units)/sum(Total_Units):

myData[, Defect.Rate := Defective_Units / Total_Units]

This ended up giving my defect rates greater than 1. Is there anywhere I can declare a calculated field, which is just a formula evaluated post aggregation?


Solution

  • You're lucky - the creator of pivottable.js foresaw cases like yours (and mine, earlier today) by implementing an aggregator called "Sum over Sum" and a few more, likewise, cf. https://github.com/nicolaskruchten/pivottable/blob/master/src/pivot.coffee#L111 and https://github.com/nicolaskruchten/pivottable/blob/master/src/pivot.coffee#L169.

    So we'll use "Sum over Sum" as parameter "aggregatorName", and the columns whose quotient we want in the "vals" parameter.

    Here's a meaningless usage example from the mtcars data for reproducibility:

    require(rpivotTable)
    data(mtcars)
    rpivotTable(mtcars,rows="gear", cols=c("cyl","carb"),
                aggregatorName = "Sum over Sum",
                vals =c("mpg","disp"),
                width="100%", height="400px")