rdatefrequency

Finding the frequency of the order of four different tests in R


I have a data frame in R from a study where four different tests, say test 1 to test 4, took place in any order. There was the suggestion to carry out the tests in the suggested order from 1 to 4, but in practice this was seldomly done. The data frame looks like this:

ID Test 1 Test 2 Test 3 Test 4

1 2020-5-1 2019-4-3 2020-6-2 2017-10-23

2 2016-1-24 2017-3-10 2018-9-17 2015-8-1

3 2017-4-3 2015-2-13 2020-8-19 2021-10-10

4 2019-8-2 2020-7-15 2013-3-1 2017-2-2

As data set is quite large, every of the 24 possibilities could occur. I mean, first Test 1, then Test 3, then Test 4, and last Test 2 for example.

My question: I would like to get the frequencies of every such sequence of the four tests. This is easily done via a histogram, if I assign to each unique ID a sequence 1234, 1324, 4231 and so on. My problem is assigning those sequences of four numbers to each student. Is there a clever way of doing this?

What I've tried: I thought of ordering each row in ascending order. Then I get the correct order of the test, but I somehow loose the information of which test. So next I thought about labelling each column, but I am not sure if this works with ordering.


Solution

  • If the columns are formatted as Date, you can use order to retrieve the sequence of indexes you're asking for.

    data <- read.table(text = "ID;Test 1;Test 2;Test 3;Test 4
    1;2020-5-1;2019-4-3;2020-6-2;2017-10-23
    2;2016-1-24;2017-3-10;2018-9-17;2015-8-1
    3;2017-4-3;2015-2-13;2020-8-19;2021-10-10
    4;2019-8-2;2020-7-15;2013-3-1;2017-2-2", header = TRUE, sep = ";")
    
    # Convert date columns
    data[,2:5] <- lapply(data[,2:5], as.Date)
    
    # Determine the order of dates for each row
    # This generates the 1234, 1243, 1324, ..., 4321 string sequences
    # and attaches it as a column to the data
    data$order <- apply(data[,2:5], 1, function(row) {
      paste(order(row), collapse='')
    })
    
    # Count the occurrences of each order
    order_counts <- table(data$order)
    
    # Plot the histogram
    barplot(order_counts, main = "Frequency of Test Date Orders", xlab = "Order", ylab = "Frequency")