I'm a student and I'm trying to do this homework, where I need to do the KNN algorith with the Mahalanobis distance as parameter, but for some reason that I can't figure out, my code is not working.
I'm not a R master, actually I know only the basics.
library(FNN)
library(readr)
library(pvclass)
library(corpcor)
iono <- read_csv("C:/Users/bruno/Dropbox/Eng. Computação/17.2/IA/Prática Ionosphere/ionosphere.data.txt",
col_names = FALSE)
p <- 0.8
iono <- as.matrix(iono)
#gerar indices para selecionar linhas da matriz
train_idx <- sample(x = nrow(iono), size = p*nrow(iono), replace = FALSE)
test_idx <- c(1:nrow(iono))[-train_idx]
#gerar as matrizes com os dados de treinamento/teste
iono_train_x <- iono[train_idx, 1:34]
iono_train_y <- iono[train_idx, 35]
iono_test_x <- iono[test_idx, 1:34]
iono_test_y <- iono[test_idx, 35]
# -------- Implementando a Funcao de Distancia Mahalanobis para KNN
# x - matriz de dados treinamento
# xtest - matriz de dados de teste
# cx - matriz de parametros de mahalanobis
mahalanobis_xy_knn <- function(xtest, x, cx) {
Mdist <- matrix(0, nrow = nrow(xtest), ncol = nrow(x))
for(i in 1:nrow(xtest)){
Mdist[i,] <- mahalanobis(x = x, center = xtest[i, ], cov = cx, inverted = TRUE)
}
return(Mdist)
}
# --------------- Algoritmo KN com Métrica da Mahalanobis
knn_custom <- function(Xtrain, Xtest, Ytrain, k, M){
# ------ Obtem as Distancias Dist(MxN) de todos Xtest(Mxd) para todos Xtrain(Nxd)
# ------ Usando a Métrica Aprendida
Dist <- mahalanobis_xy_knn(Xtest,Xtrain, M)
# Dist <- dist()
#dados <- data.frame(Dist, Ytrain)
Yhat <- matrix(0, nrow = nrow(Xtest), ncol = 1)
label_um <- 0
label_dois <- 0
# ---- Calcula o Label de Cada Xtest
for(i in 1:length(Yhat)){
# Agrupa Dist e Y num data frame
dados <- data.frame(Dist[i,], Ytrain)
# Ordena Data frame segundo a Distancia
ind <- order(dados$Dist.i...)
# Toma os K Labels mais Próximos
k_labels_proximos <- Y[ind[1:k]]
# Verifica a maioria
for(j in 1:k){
if (k_labels_proximos[j] == 1) label_um <- label_um + 1
else label_dois <- label_dois + 1
}
if(label_um > label_dois) Yhat[i] <- 1
else if(label_um < label_dois) Yhat[i] <- 2
label_um <- 0
label_dois <- 0
}
return(Yhat)
}
# ------------- Aprendizado da Metrica de Mahalanobis
#dados <- data.frame(iono_test_x,iono_test_y)
M_cov <- cov(iono)
inv_m_cov <- pseudoinverse(iono)
M_ident <- diag(ncol(iono))
# ------ Aplicar o K-NN com a Métrica Mahalanobis com matriz de Covariancia
saida_knn_maha <- knn_custom(train_idx, iono_test_x, iono_test_y, k, M_cov)
acc_knn_maha <- sum(iono_test_y == saida_knn_maha)/length(iono_test_y) * 100
When I try to run the code, this is the error I get:
Error: is.numeric(x) || is.logical(x) is not TRUE
RStudio doesn't show me where the error is, so I can't fix it. The problem is in the comparisons?
It would be good if you could share the structure and some of the data observations of your data. I would assume your dataset "Ionosphere" is the same as this one: https://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/Ionosphere
If the error is from this line:
M_cov <- cov(iono)
Check if the dataset of iono here (when executing the covariance function) contains non-numeric variables, e.g. factor / str / missing values. You can check the structure using the below function:
str(iono)
All non-numeric variables should be excluded to avoid such problem when executing the cov function.