rdataframefor-loopacronym

Assigning full names to acronyms


Currently I have a data frame which has the acronyms of unique cancer types (hotspot_mockup), like so:

Cancer Gene
AASTR IDH1
ACRM NRAS

In another data frame, I these 184 unique acronyms and their corresponding full names (new_hotspot_cancers). This is in the form:

Acronym Full Name
AASTR Anaplastic Astrocytoma
ACRM Acral Melanoma

I want to replace the acronyms in the first data frame with the corresponding full-names in the second data frame (assuming of course, the acronym exists in the second data frame). Overall, I want the result to look like:

Cancer Gene
Anaplastic Astrocytoma IDH1
Acral Melanoma NRAS

I was thinking of some kind of "for" loop, but I know this is frowned upon in R. As always, any guidance would be greatly appreciated!


Solution

  • I was thinking of some kind of "for" loop, but I know this is frowned upon in R.

    It's not that it's frowned upon, it's that those who have experience in other programming languages tend to use for loops in R when they are not needed - either because R vectorizes by default, or because there are functions like lapply() or map() from the purrr package that do the job of a for loop more efficiently.

    In this case, you can just do a left_join(), from the dplyr package.

    df1 <- data.frame(Cancer = c("AASTR", "ACRM"), Gene = c("IDH1", "NRAS"))
    df2 <- data.frame(Acronym = c("AASTR", "ACRM"), Full_Name = c("Anaplastic Astrocytoma", "Acral Melanoma"))
    
    dplyr::left_join(df1, df2, by = c("Cancer" = "Acronym"))
    
      Cancer Gene              Full_Name
    1  AASTR IDH1 Anaplastic Astrocytoma
    2   ACRM NRAS         Acral Melanoma