I'm working on a function for comparing the structures of two data frames in R, in order to build a validation filter for uploading data into a table. I realize that when you have a data frame in R where a column has both numbers and text strings, the entire column defaults to the "character" class.
Suppose we have this mixed data frame:
> df1
col1 col2
1 1 4
2 2 five
3 three 6
Whereby df1
is built via:
df1 <- data.frame(
col1 = c("1", "2", "three"),
col2 = c("4", "five", "6")
)
And we have another mixed data frame df2
:
> df2
col1 col2
1 11 14
2 12 fill
3 tree 16
df2 <- data.frame(
col1 = c("11", "12", "tree"),
col2 = c("14", "fill", "16")
)
I'd like to run a structure comparison between the two, AS IF data frame elements that could be converted to numerics were actually converted to numerics. Ignoring the actual values. In the comparison of df1
and df2
, the structures match. Is there a way to run this type of comparison in R?
And continuing the example, supposing we have another data frame df3
that we want to compare with df1
, there would be no structure match since the df3[2,1]
is a text string and df1[2,1]
contains an element (of 2) that may be converted to a numeric:
> df3
col1 col2
1 11 14
2 kats fill
3 tree 16
df3 <- data.frame(
col1 = c("11", "kats", "tree"),
col2 = c("14", "fill", "16")
)
You can detect every cell if there is any non-numeric character.
compare <- function(x, y) {
identical(sapply(x, grepl, pattern = "\\D"),
sapply(y, grepl, pattern = "\\D"))
}
compare(df1, df2)
# [1] TRUE
compare(df1, df3)
# [1] FALSE