rdplyrdata-wrangling

Get mode from list of pipe-separated digits


I have a character variable containing codes discribing project characteristics. Looking like this:

[1] "151"     "510|130|130"     "311|110" "140"     "160|160"     "160|160|130"
[7] "160"     "160"     "160"      "151"     "151"     "160|110"    

I need to extract the main characteristic of the project, meaning the code that dominates. In case there is no dominating code I choose the first. resulting in:

[1] "151"     "130"     "311"      "140"     "160"     "160"
[7] "160"     "160"     "160"      "151"     "151"     "160"    

Any suggestion on how to achieve this?


Solution

  • You can use strsplit to split your vector and use collapse::fmode to get the value that "dominate" (a so-called statistical mode), and the first value if there is a tie (which is the default behavior of fmode):

    x <- c("151", "510|130|130", "311|110")
    as.numeric(sapply(strsplit(x, "\\|"), collapse::fmode))
    #[1] 151 130 311
    

    Other ways of making a mode function, which is not directly implemented in base R, can be found here.