I often work with SAS datasets which tend to come with descriptions/labels for each column, visible when using the View()
function (they appear as subtext under each column name).
My questions has two parts: 1) how can one set those labels manually? and 2) can those labels be set using values present in a tibble?
For example, let's say I have the following dataset (which is extremely similar to one I just actually received):
library(dplyr)
df <- tibble(SBJID = c("Subject ID", 1, 2, 3, 4, 5),
AGBL = c("Age at baseline", 54, 23, 18, 29, 31),
LBCD = c("Parameter code", rep("HGB", 5)),
LBRSSTDU = c("Result in standard units", 10, 12, 9, 14, 11))
SBJID | AGBL | LBCD | LBRSSTDU |
---|---|---|---|
Subject ID | Age at baseline | Parameter code | Result in standard units |
1 | 54 | HGB | 10 |
2 | 23 | HGB | 12 |
3 | 18 | HGB | 9 |
4 | 29 | HGB | 14 |
5 | 31 | HGB | 11 |
I obviously don't want the first row to remain, I want to remove that row and set the values to be the descriptions for the column heads (again, as one would see from a SAS-derived data frame).
Any suggestions?
Please try the labelled
package as below
manual approach
library(labelled)
df2 <- labelled::set_variable_labels(df, SBJID="Subject ID",
AGBL="Age at baseline",
LBCD="Parameter code",
LBRSSTDU="Result in standard units") %>%
filter(row_number()!=1)
Created on 2023-08-02 with reprex v2.0.2
tibble approach
library(labelled)
# get the names of the variables
nam <- names(df)
# get the labels from the first row of the tibble
lab <- paste(df[!str_detect(df$SBJID,'\\d'),],sep='#')
# create a tibble with name and label
description <- tibble(name=nam, label=lab)
# set names to the labels
var_labels <- setNames(as.list(description$label), description$name)
# set the labels with var_labels
df_labelled <- df %>%
set_variable_labels(.labels = var_labels, .strict = FALSE) %>% filter(row_number()!=1)
Created on 2023-08-02 with reprex v2.0.2