rdataframecumulative-frequency

R Add CDF Columns to DataFrame


Suppose I have the following R dataframe:

enter image description here

The Peril and Range columns are both factors. And I want to create a cumulative distribution column for Counts and Value like so:

enter image description here

How would I do this? I am using dplyr if that helps.


Solution

  • Assuming you have the data stored in df this should work:

    df %>%
      group_by(Peril) %>%
      mutate(
        'Count CDF' = cumsum(Counts) / sum(Counts),
        'Values CDF' = cumsum(Values) / sum(Values)
      )
    

    However your first and second table seem to have different counts and values for the 'Other' Peril.