rplotvisualize

Plot a large actual vs. expected data in R


My dataset contains three columns: ID (N= 1000), expected score, and actual score. The score can be 1, 2, 3, or 4.

Here is a simulated data

ID <- seq(from = 1, to = 1000, by=1)
actual <- round(runif(1000, min=1, max=4))
expected <- round(runif(1000, min=1, max=4))
mydata <- data.frame(ID, actual, expected)

We can easily create a contingency table using

table(mydata$actual, mydata$expected)

I need to create a plot this data for each ID. So imagine the plot will be a matrix of 1000 times 1000.

If Actual=Expected, the color of these cells will be white
If Actual < Expected, the color of these cells will be red
If Actual > Expected, the color of these cells will be blue

Solution

  • There is only one ID per pair of actual and expected, so it will be a linear graph. You don't want to plot actual and expected values, right?

    ID <- seq(from = 1, to = 1000, by=1)
    actual <- round(runif(1000, min=1, max=4))
    expected <- round(runif(1000, min=1, max=4))
    mydata <- data.frame(ID, actual, expected)
    View(mydata)
    t = table(mydata$actual, mydata$expected)
    attach(mydata)
    col1 = ifelse(actual == expected , "white", ifelse(actual < expected, "red", "blue")) 
    plot(ID,col=col1)
    

    enter image description here

    But if you want a 4x4 matrix with colors and boxes that represent frequencies, you can do that:

    plot(t,col=col1) 
    

    enter image description here

    Edit. I guess, what you want is a map of ANY actual vs ANY expected? This can be done in a more elegant way, but due to lack of time I cannot provide a full solution with your desired colors. Here's a quick solution with basic colors (but color scheme is also coded in). Suppose, you have N=5.

    set.seed(12345)
    ID <- seq(from = 1, to = 5, by=1)
    actual <- round(runif(5, min=1, max=4))
    expected <- round(runif(5, min=1, max=4))
    mydata <- data.frame(ID, actual, expected)
    
    > mydata
      ID actual expected
    1  1      3        1
    2  2      4        2
    3  3      3        3
    4  4      4        3
    5  5      2        4
    
    colID = matrix("",5,5)
    arr = matrix(0,5,5)
    for (i in 1:5) {
      for (j in 1:5) {
        colID[i,j] = ifelse(actual[i] == expected[j] , "green", ifelse(actual[i] < expected[j], "red", "blue")) 
        arr[i,j] = ifelse(actual[i] == expected[j] , 1, ifelse(actual[i] < expected[j], 2, 3)) 
      }  
    }
    
    > arr
         [,1] [,2] [,3] [,4] [,5]
    [1,]    3    3    1    1    2
    [2,]    3    3    3    3    1
    [3,]    3    3    1    1    2
    [4,]    3    3    3    3    1
    [5,]    3    1    2    2    2
    > colID
         [,1]   [,2]    [,3]    [,4]    [,5]   
    [1,] "blue" "blue"  "green" "green" "red"  
    [2,] "blue" "blue"  "blue"  "blue"  "green"
    [3,] "blue" "blue"  "green" "green" "red"  
    [4,] "blue" "blue"  "blue"  "blue"  "green"
    [5,] "blue" "green" "red"   "red"   "red"  
    
    > image(arr)
    

    enter image description here

    Logic - create an array of NxN with 3 levels of either custom colors, or custom integers (1, 2, 3) and plot it as an image. Time permitting, I will try to make colors custom in image, but cannot guarantee.