rconditional-statementsrescale

Change metrics inside rows by condition


Suppose a data:

  df1 <- tibble::tribble(~"M1", ~"M2",  ~"Beer, pints", ~"Coffee, oz",  ~"Gasoline, galons",    ~"Milk, galons",    ~"Warehouse, square feet",  ~"Nearest place, miles",    
    "NY",   "22",   "10",   "12",   "15",   "100",  "100",  "20",
    "NY",   "20",   "9",    "10",   "12",   "100",  "100",  "20",
    "NY",   "18",   "8",    "9",    "11",   "100",  "100",  "20",
    "M1",   "M2",   "Beer, liters", "Coffee, cups (120 ml)",    "Gasoline, liters", "Milk, liters", "Warehouse, square meters", "Nearest place, kilometers",
    "PR",   "22",   "7",    "8",    "9",    "70",   "67",   "7",
    "PR",   "20",   "6",    "7",    "8",    "80",   "75",   "7",
    "M1",   "M2",   "Beer, pints",  "Coffee, oz",   "Gasoline, liters", "Milk, liters", "Warehouse, square feet",   "Nearest place, miles",
    "KR",   "22",   "6",    "6",    "7",    "60",   "50",   "9",
    "KR",   "20",   "5",    "6",    "8",    "55",   "65",   "9",
    "KR",   "18",   "5",    "6",    "8",    "50",   "55",   "9")

For visual representation: enter image description here

Is there a nice method to recalculate all columns in the same metrics (like if it is liters, then the entrire column should be liters; if miles (not kilometers), then the entire column to be miles [based on condition in the subheadings inside]? It could be great to think on the nicest methods to solve it.

PS: for information:

1 gallon = 3.78541 liters

1 pint = 0.473176 liters

1 oz = 0.0295735 liters

11 square feet = 1.02193 square meters

1 mile = 1.60934 kilometers

I am just wondering and just started to consider for solution. I am interested to look for possible nice solutions. In addition, it will be interesting for the entire R community to think on the best methods to edit the data by condition.


Solution

  • When the data is sloppy, we must also get our hands dirty.I thought of way, with many steps.

    Data

    df1 <-
    structure(list(m1 = c("M1", "NY", "NY", "NY", "M1", "PR", "PR", 
    "M1", "KR", "KR", "KR"), m2 = c("M2", "22", "20", "18", "M2", 
    "22", "20", "M2", "22", "20", "18"), beer = c("Beer, pints", 
    "10", "9", "8", "Beer, liters", "7", "6", "Beer, pints", "6", 
    "5", "5"), coffee = c("Coffee, oz", "12", "10", "9", "Coffee, cups (120 ml)", 
    "8", "7", "Coffee, oz", "6", "6", "6"), gasoline = c("Gasoline, galons", 
    "15", "12", "11", "Gasoline, liters", "9", "8", "Gasoline, liters", 
    "7", "8", "8"), milk = c("Milk, galons", "100", "100", "100", 
    "Milk, liters", "70", "80", "Milk, liters", "60", "55", "50"), 
        warehouse = c("Warehouse, square feet", "100", "100", "100", 
        "Warehouse, square meters", "67", "75", "Warehouse, square feet", 
        "50", "65", "55"), nearest_place = c("Nearest_place, miles", 
        "20", "20", "20", "Nearest place, kilometers", "7", "7", 
        "Nearest place, miles", "9", "9", "9")), row.names = c(NA, 
    -11L), class = c("tbl_df", "tbl", "data.frame"))
    

    Convert function

    convert_unit <- function(value,unit){
      
      m <- 
      case_when(
        unit == "galons" ~ 3.78541,
        unit == "pints" ~ 0.473176,
        unit == "oz" ~ 0.0295735,
        unit == "squarefeet" ~ 1.02193/11,
        unit == "miles" ~ 1.02193/11,
        TRUE ~ 1
        )
    
      output <- m*as.numeric(value)
      
      return(output)
    }
    

    Data preparation

    First, I would add the header as the first row and also create better names.

    library(dplyr)
    library(stringr)
    library(tidyr)
    #remotes::install_github("vbfelix/relper")
    library(relper)
    
    
    or_names <- names(df1)
    new_names <- str_to_lower(str_select(or_names,before = ","))
    n_row <- nrow(df1)
    
    df1[2:(n_row+1),] <- df1
    df1[1,] <- as.list(or_names)
    names(df1) <- new_names
    

    Data manipulation

    Then, I would create new columns with the units, and the apply the function to each one.

    df1 %>% 
      mutate(
        across(.cols = -c(m1:m2),.fns = ~str_keep(str_select(.,after = ",")),.names = "{.col}_unit"),
        aux = beer_unit == "",
        across(.cols = ends_with("_unit"),~if_else(. == "",NA_character_,.))) %>% 
      fill(ends_with("_unit"),.direction = "down") %>%
      filter(aux) %>% 
      mutate(
        across(
          .cols = beer:nearest_place,
          .fns = ~convert_unit(value = .,unit = get(str_c(cur_column(),"_unit")))
          )
      ) %>% 
      select(-aux,-ends_with("_unit"))
    

    Output

    # A tibble: 8 x 8
      m1    m2     beer coffee gasoline  milk warehouse nearest_place
      <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>         <dbl>
    1 NY    22     4.73  0.355     56.8  379.      9.29         1.86 
    2 NY    20     4.26  0.296     45.4  379.      9.29         1.86 
    3 NY    18     3.79  0.266     41.6  379.      9.29         1.86 
    4 PR    22     7     8          9     70      67            7    
    5 PR    20     6     7          8     80      75            7    
    6 KR    22     2.84  0.177      7     60       4.65         0.836
    7 KR    20     2.37  0.177      8     55       6.04         0.836
    8 KR    18     2.37  0.177      8     50       5.11         0.836