rdplyr

How to split one row of dataframe into two rows based on several conditions (R)


I have a source dataset that looks like the following:

cat1 <- c('Text1','Text1', 'Text2','Text1', 'Text1','Text4', 'Text1','Text3', 'Text1','Text1', 'Text2', 'Text1')
cat2 <- c('AA','AA','BB','BB','AA','BB','AA','BB','AA','BB','AA','AA')
col3 <- c('A' ,'B' ,'C' ,'D' ,'E' ,'F' ,'G' ,'H' ,'I' ,'J' ,'K' ,'L' )
grp1 <- c('Block 1','Block 2','Block 1','Block 2','Block 3','Block 3', 'Block 4', 'Block 1', 'Block 4', 'Block 3', 'Block 2', 'Block 1')
nums <- c(100, 200, 300, 400, 100, 200, 300, 400, 100, 200, 300, 400)
orig_src_df <- cbind(cat1, cat2, col3, grp1, nums) |> 
  as.data.frame() |>
  mutate(nums = as.numeric(nums))

I have a 2nd dataset that looks like this:

grp1 <- c('Block 1', 'Block 1', 'Block 2', 'Block 3', 'Block 4', 'Block 4')
grp2 <- c('Block 1', 'Block 5', 'Block 5', 'Block 6', 'Block 6', 'Block 5')
ratio <-c(0.15, 0.85, 1, 1, 0.25, 0.75)

conversion_df <- cbind(grp1, grp2, ratio) |>
  as.data.frame() |>
  mutate(
    cat1 = 'Text1',
    cat2 = 'AA',
    ratio = as.numeric(ratio)
  )

Anytime I have values where cat1 == "Text1" and cat2 == "AA", I want to split the values of nums by the ratio in the 2nd dataset and add the grp2 column as it will supersede grp1.

What is a clean way in which I can do this?


Solution

  • Maybe like this?

    You mentioned "add the grp2 column as it will supersede grp1", so here I have altered grp1 to be the new grp2 only when a match was found in the 2nd table. coalesce will output the first non-missing (non-NA) value, so we can use that there, and also to make nums_new use the ratio when available, or 1 if not.

    orig_src_df |>
      left_join(conversion_df) |>
      transmute(cat1, cat2, col3,
                grp1 = coalesce(grp2, grp1),
                nums_orig = nums,
                nums_new = coalesce(ratio, 1) * nums)
    

    Result

    Joining with `by = join_by(cat1, cat2, grp1)`
        cat1 cat2 col3    grp1 nums_orig nums_new
    1  Text1   AA    A Block 1       100       15
    2  Text1   AA    A Block 5       100       85
    3  Text1   AA    B Block 5       200      200
    4  Text2   BB    C Block 1       300      300
    5  Text1   BB    D Block 2       400      400
    6  Text1   AA    E Block 6       100      100
    7  Text4   BB    F Block 3       200      200
    8  Text1   AA    G Block 6       300       75
    9  Text1   AA    G Block 5       300      225
    10 Text3   BB    H Block 1       400      400
    11 Text1   AA    I Block 6       100       25
    12 Text1   AA    I Block 5       100       75
    13 Text1   BB    J Block 3       200      200
    14 Text2   AA    K Block 2       300      300
    15 Text1   AA    L Block 1       400       60
    16 Text1   AA    L Block 5       400      340