data <- data.frame(Q1_1 = c("abc",1,2,3,4),
Q1_2 = c("def",2,3,6,2),
Q2_1 = c("ghi",4,5,3,1),
Q2_2 = c("jkl",6,3,8,4),
ID = c("ID", "AA", "BB", "CC", "DD"),
Order = c("Order", "zz", "ss", "tt", "qq"))
The above resembles my data frame in terms of its layout, which should look like the one below. The actual data are saved as in a CSV file, once you read it into R with read.csv(), it would create a data frame like this.
Q1_1 Q1_2 Q2_1 Q2_2 ID Order
1 abc def ghi klm ID Order
2 1 2 4 6 AA zz
3 2 3 5 3 BB ss
4 3 6 3 8 CC tt
5 4 2 1 4 DD qq
The trickiest thing about it is that the data file has the first two rows as headers, and once it is read into R, it produces a data frame that presents the secondary header in the first row.
This is what I want to end up seeing. There are the codes that I manually wrote.
melted <- data.frame(response = c(1,2,3,4,2,3,6,2,4,5,3,1,6,3,8,4),
question_code = c("abc","abc","abc","abc",
"def","def","def","def",
"ghi","ghi","ghi","ghi",
"jkl","jkl","jkl","jkl"),
question_number = c("Q1_1","Q1_1","Q1_1","Q1_1",
"Q1_2","Q1_2","Q1_2","Q1_2",
"Q2_1","Q2_1","Q2_1","Q2_1",
"Q2_2","Q2_2","Q2_2","Q2_2"),
ID = c("AA", "BB", "CC", "DD",
"AA", "BB", "CC", "DD",
"AA", "BB", "CC", "DD",
"AA", "BB", "CC", "DD"),
Order = c("zz", "ss", "tt", "qq",
"zz", "ss", "tt", "qq",
"zz", "ss", "tt", "qq",
"zz", "ss", "tt", "qq"))
And it should look like this...
response question_code question_number ID Order
1 1 abc Q1_1 AA zz
2 2 abc Q1_1 BB ss
3 3 abc Q1_1 CC tt
4 4 abc Q1_1 DD qq
5 2 def Q1_2 AA zz
6 3 def Q1_2 BB ss
7 6 def Q1_2 CC tt
8 2 def Q1_2 DD qq
9 4 ghi Q2_1 AA zz
10 5 ghi Q2_1 BB ss
11 3 ghi Q2_1 CC tt
12 1 ghi Q2_1 DD qq
13 6 jkl Q2_2 AA zz
14 3 jkl Q2_2 BB ss
15 8 jkl Q2_2 CC tt
16 4 jkl Q2_2 DD qq
So basically I want to stack the values ("response") into one column while creating four columns as references. Two of the four columns are question_code and question__name, which are transposed from the original two-header. The rest two column simply presents ID and Order (I do not know if the way I explained is understandable, but all the codes are here).
If the data file has only one row as header, I know how to do it with melt()
, but this two-row header seems to largely complicates the situation.
data %>%
mutate(across(starts_with('Q'), ~c(str_c(cur_column(),.[1], sep=':'), .[-1]))) %>%
set_names(.[1,]) %>%
slice(-1) %>%
pivot_longer(starts_with('Q'), values_to = 'response', names_sep = ':',
names_to = c("question_number", "question_code"))
# A tibble: 16 × 5
ID Order question_number question_code response
<chr> <chr> <chr> <chr> <chr>
1 AA zz Q1_1 abc 1
2 AA zz Q1_2 def 2
3 AA zz Q2_1 ghi 4
4 AA zz Q2_2 jkl 6
5 BB ss Q1_1 abc 2
6 BB ss Q1_2 def 3
7 BB ss Q2_1 ghi 5
8 BB ss Q2_2 jkl 3
9 CC tt Q1_1 abc 3
10 CC tt Q1_2 def 6
11 CC tt Q2_1 ghi 3
12 CC tt Q2_2 jkl 8
13 DD qq Q1_1 abc 4
14 DD qq Q1_2 def 2
15 DD qq Q2_1 ghi 1
16 DD qq Q2_2 jkl 4