rcovariance-matrixiris-dataset

Problem with finding covariance matrix for Iris data in R


I keep getting NAs when trying to find the covariance matrix for the Iris data in R.

library(ggplot2)
library(dplyr)

dim(iris)
head(iris)

numIris <- iris %>% 
  select_if(is.numeric)

plot(numIris[1:100,])

Xraw <- numIris[1:1000,]

plot(iris[1:150,-c(5)]) #species name is the 5th column; excluding it here.
Xraw = iris[1:1000,-c(5)] # this excludes the 5th column, which is the species column
#first, to get covariance, we need to subtract the mean from each column

X = scale(Xraw, scale = FALSE)

head(X)

Xs <- scale(Xraw, scale = TRUE)
head(Xs)

covMat  = (t(X)%*%X)/ (nrow(X)-1)
head(covMat)

Solution

  • Is there a reason you can't use cov(numIris)?

    By trying to select 1000 rows of a matrix/data frame with only 150 rows, you end up with 850 rows full of NA values (try tail(Xraw) to see). If you set Xraw <- iris[, -5] and go from there you get results such that all.equal(covMat, cov(iris[, -5])) is TRUE.