rrenameanti-join

Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as book2


Below is the first dataframe where I want to remove the first 3 rows:

book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1", 
                                              "id", "632592651", "633322173", "634703802", "634927873", "635812953", 
                                              "636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse", 
         "elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("", 
                                                                         "", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n", 
                                                                         "y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173", 
                                                                                       "634703802", "634927873", "635812953", "636004739", "636101211", 
                                                                                       "636157799", "636263106", "636752420"), X.3 = c("", "", "", "title", 
                                                                                                                                       "asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2", 
                                                                                                                                       "mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807", 
                                                                                                                                                                              "0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909", 
                                                                                                                                                                              "0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                -14L))

I did book1 <- book1[-c(1:3),] but I'm not sure how to make id, title, hazard, id, title, index as the column name instead of Instructions..xyz, etc. See image below for desired output

enter image description here

Then for the second dataframe,

book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L, 
                                     634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L, 
                                     636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L, 
                                     2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L, 
                                     2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant", 
                                                            "goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion", 
                                                            "leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile", 
                                                            "mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L, 
                                                                                                     2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L, 
                                                                                                     4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                              -20L))

I then rename column 1 and 2 in book2 so that it matches that of book1 by names(book2)[1:2] <- c('id','title') where I can later do inner_join. The desired output is shown in the image below by

library(dplyr)
book1 %>%
  inner_join(book2, by = c("id", "title")) 

enter image description here

This is taking quite a few steps and wondering if there's a simplified version to this?


Solution

  • Found the solution to the first question

    library(janitor)
    book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)
    

    I applied

    names(book1)[4:5] <- c('id1','title1')
    

    to obtain unique column name, then tried inner_join as proposed earlier but with error and found that book1$id is character where book2$id is int and so I did

    book1$id <- as.integer(book1$id)
    

    and finally it works with

    library(tidyverse)
    Yeah <- book1 %>%
      inner_join(book2, by = c("id", "title")) 
    

    Output below:

     id    title hazard       id1    title1       index volume
    1 633322173      cat      y 633322173      cat2  0.20089779    432
    2 634703802      dog      y 634703802      dog2 0.752228086    324
    3 634927873    mouse      n 634927873    mouse2 0.124729276    333
    4 635812953 elephant      n 635812953 elephant2 0.626285086   2223
    5 636004739    goose      y 636004739    goose2 0.134537909 412346
    6 636101211      rat      y 636101211      rat2 0.612526768   7456
    7 636157799     mice      n 636157799     mice2 0.769622463   3456
    8 636263106    kitty      n 636263106    kitty2 0.682532524   2345
    9 636752420   kitten      y 636752420   kitten2 0.819015658   2345
    

    Still wondering if there's a quicker way?