I'm a little perplex concerning the exact way to proceed with this wrangling procedure.
I've a dataset which consist in raters that are assessing lung sounds (S1,...,S40). For each sound the assessed some quality of the recording (_A, _B, _X).
The dataframe looks like this bd example (I gave an example with 3 raters, scoring 2 sounds for 2 different characteristics A and B). You can see that each row represents 1 rater results for the different sounds (S_1, S_2) and sounds characteristics (_A, _B).
bd <- data.frame(Operator=c("Rater1", "Rater2", "Rater3"),
S1_A=c("Good", "Good", "Bad"), #quality assessment
S1_B=c(3,2,3), #confidence in assessment 1-5 likert scale
S2_A=c("Don't know", "Bad", "Don't know"),
S2_B=c(4,4,2))
I'd like to get a result like that:
Operator | Sound | A | B |
---|---|---|---|
Rater1 | 1 | Good | 3 |
Rater1 | 2 | Don't know | 4 |
Rater2 | 1 | Good | 2 |
Rater2 | 2 | Bad | 4 |
Rater3 | 1 | Bad | 3 |
Rater3 | 2 | Don't know | 2 |
I'm not able to design adequately the code to first create a column identifying the sound and then adding the sounds characteristics.
Thanks in advance for your help!!!
You can use names_pattern =
with a regex to achieve your desired outcome. In this example:
names_to = c("Sound", ".value")
creates a column to store the original column names (Sound), and .value
is used to create a column for each suffix e.g. A and B.names_pattern = "S(\\d+)_(A|B)"
defines what values are carried over from the original column names to the new columns and column names. The brackets ()
in the regex are used to capture what values to carry over. The first capture corresponds to the Sound column, and captures a one or more digits using \\d+
but only if it is after a literal "S". For the .value
columns, the second capture looks for either A or B after a literal underscore.Note that this works for your sample data, but you mention _A, _B, and _X as suffixes. To include _X (or any other suffixes), simply add them to the second capture e.g. names_pattern = "S(\\d+)_(A|B|X)"
.
library(tidyr)
bd |>
pivot_longer(cols = -Operator,
names_to = c("Sound", ".value"),
names_pattern = "S(\\d+)_(A|B)")
# # A tibble: 6 × 4
# Operator Sound A B
# <chr> <chr> <chr> <dbl>
# 1 Rater1 1 Good 3
# 2 Rater1 2 Don't know 4
# 3 Rater2 1 Good 2
# 4 Rater2 2 Bad 4
# 5 Rater3 1 Bad 3
# 6 Rater3 2 Don't know 2