rif-statementdataframeon-the-fly

How to sum ifelse statements on the fly with [R]


I have a r conundrum and would be very grateful of any assistance please. I need to write a piece of code that requires to be written one line to fit with a larger automated process. I have supplied some dummy data to help illustrate.

I have three ifelse statements that return 1’s or 0’s. I need to sum these 1’s and 0’s yet because of other inherited constraints in my real data I can’t refer to their output ‘and then’ sum them. I ‘need’ to sum them on the fly.

To be explicit… I cannot explicitly refer to the output 1’s and 0’s of either ‘use_sms’, ‘use_data’ or ‘use_voice’ and I cannot just pass an apply/1/sum to the dataframe.

Somehow, what I need is a fully contained sum of the three ifelse’s, something along the lines of… in crude non r language…

sum(
ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
) 

My real data is presented to me similar to this headache_df

headache_df = data.frame(sms_rev0 = sample(1:0, 10, replace = T),
                        sms_cnt0 = sample(1:0, 10, replace = T),
                        sms_rev1 = sample(1:0, 10, replace = T),
                        sms_cnt1 = sample(1:0, 10, replace = T),
                        sms_rev2 = sample(1:0, 10, replace = T),
                        sms_cnt2 = sample(1:0, 10, replace = T),
                        data_rev0 = sample(1:0, 10, replace = T),
                        data_cnt0 = sample(1:0, 10, replace = T),
                        data_rev1 = sample(1:0, 10, replace = T),
                        data_cnt1 = sample(1:0, 10, replace = T),
                        data_rev2 = sample(1:0, 10, replace = T),
                        data_cnt2 = sample(1:0, 10, replace = T),
                        voice_rev0 = sample(1:0, 10, replace = T),
                        voice_cnt0 = sample(1:0, 10, replace = T),
                        voice_rev1 = sample(1:0, 10, replace = T),
                        voice_cnt1 = sample(1:0, 10, replace = T),
                        voice_rev2 = sample(1:0, 10, replace = T),
                        voice_cnt2 = sample(1:0, 10, replace = T))

row.names(headache_df) = paste0("row", 1:10)

And i am looking to capture my results in this headache combating panado_df

panado_df = data.frame(user = row.names(headache_df))
attach(headache_df)
set.seed(1234)

I generate three ifelse statements to illustrate but in my real data its really the sum of these I need to capture.

panado_df$use_sms = ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0)
panado_df$use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0)
panado_df$use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
rownames(panado_df) = panado_df$user
panado_df$user = NULL

I present a target column to illustrate what my calculated data should look like. Any cool solutions to achieve my aim please?

panado_df$target_column = apply(panado_df, 1, sum)

Solution

  • If I understand you correctly, you might be looking for something like this

    panado_df$sums_3 <- sum(ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
        ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
        ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0))
    

    And your code could be more descriptive (just like you did it) using dplyr like follows

    pando_df <- headach_df %>%
        mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
            use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
            use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
        rowwise() %>%
        mutate(target_column=sum(use_sms, use_data, use_voice))
    

    and if you'd like to return the vector target_column directly, adding magrittr library, check the following

    pando_df <- headach_df %>%
        mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
            use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
            use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
        rowwise() %>%
        mutate(target_column=sum(use_sms, use_data, use_voice)) %$%
        target_column