I have a data frame in R from a study where four different tests, say test 1 to test 4, took place in any order. There was the suggestion to carry out the tests in the suggested order from 1 to 4, but in practice this was seldomly done. The data frame looks like this:
ID Test 1 Test 2 Test 3 Test 4
1 2020-5-1 2019-4-3 2020-6-2 2017-10-23
2 2016-1-24 2017-3-10 2018-9-17 2015-8-1
3 2017-4-3 2015-2-13 2020-8-19 2021-10-10
4 2019-8-2 2020-7-15 2013-3-1 2017-2-2
As data set is quite large, every of the 24 possibilities could occur. I mean, first Test 1, then Test 3, then Test 4, and last Test 2 for example.
My question: I would like to get the frequencies of every such sequence of the four tests. This is easily done via a histogram, if I assign to each unique ID a sequence 1234, 1324, 4231 and so on. My problem is assigning those sequences of four numbers to each student. Is there a clever way of doing this?
What I've tried: I thought of ordering each row in ascending order. Then I get the correct order of the test, but I somehow loose the information of which test. So next I thought about labelling each column, but I am not sure if this works with ordering.
If the columns are formatted as Date
, you can use order
to retrieve the sequence of indexes you're asking for.
data <- read.table(text = "ID;Test 1;Test 2;Test 3;Test 4
1;2020-5-1;2019-4-3;2020-6-2;2017-10-23
2;2016-1-24;2017-3-10;2018-9-17;2015-8-1
3;2017-4-3;2015-2-13;2020-8-19;2021-10-10
4;2019-8-2;2020-7-15;2013-3-1;2017-2-2", header = TRUE, sep = ";")
# Convert date columns
data[,2:5] <- lapply(data[,2:5], as.Date)
# Determine the order of dates for each row
# This generates the 1234, 1243, 1324, ..., 4321 string sequences
# and attaches it as a column to the data
data$order <- apply(data[,2:5], 1, function(row) {
paste(order(row), collapse='')
})
# Count the occurrences of each order
order_counts <- table(data$order)
# Plot the histogram
barplot(order_counts, main = "Frequency of Test Date Orders", xlab = "Order", ylab = "Frequency")