rdplyrsasr-haven

How can column labels/descriptions be set in R using values from a dataframe?


I often work with SAS datasets which tend to come with descriptions/labels for each column, visible when using the View() function (they appear as subtext under each column name).

My questions has two parts: 1) how can one set those labels manually? and 2) can those labels be set using values present in a tibble?

For example, let's say I have the following dataset (which is extremely similar to one I just actually received):

library(dplyr)

df <- tibble(SBJID = c("Subject ID", 1, 2, 3, 4, 5),
             AGBL = c("Age at baseline", 54, 23, 18, 29, 31),
             LBCD = c("Parameter code", rep("HGB", 5)),
             LBRSSTDU = c("Result in standard units", 10, 12, 9, 14, 11)) 
SBJID AGBL LBCD LBRSSTDU
Subject ID Age at baseline Parameter code Result in standard units
1 54 HGB 10
2 23 HGB 12
3 18 HGB 9
4 29 HGB 14
5 31 HGB 11

I obviously don't want the first row to remain, I want to remove that row and set the values to be the descriptions for the column heads (again, as one would see from a SAS-derived data frame).

Any suggestions?


Solution

  • Please try the labelled package as below

    manual approach

    library(labelled)
    
    df2 <- labelled::set_variable_labels(df, SBJID="Subject ID",
                                  AGBL="Age at baseline",
                                  LBCD="Parameter code",
                                  LBRSSTDU="Result in standard units") %>% 
      filter(row_number()!=1)
    

    Created on 2023-08-02 with reprex v2.0.2

    tibble approach

    library(labelled)
    
    # get the names of the variables 
    nam <- names(df)
    
    # get the labels from the first row of the tibble
    lab <- paste(df[!str_detect(df$SBJID,'\\d'),],sep='#')
    
    # create a tibble with name and label 
    description <- tibble(name=nam, label=lab)
    
    # set names to the labels 
    var_labels <- setNames(as.list(description$label), description$name)
    
    # set the labels with var_labels 
    df_labelled <- df %>%
      set_variable_labels(.labels = var_labels, .strict = FALSE) %>% filter(row_number()!=1)
    

    Created on 2023-08-02 with reprex v2.0.2

    enter image description here