I have a large dataframe in R and am trying to do some stats tests on certain columns, but the non-programmers who made the csv file added a bunch of text notes that I need to ignore.
For example a column might have values: 12,20,40,missing,64,32,no input,45,10
How do I only select the numbers using the which statement? I failed miserably trying: my_data_frame$Column.Title[which(is.numeric(my_data_frame$Column.Title))]
What do I change in the which function to only select the numbers and ignore the text? Thanks!
You can use the built-in as.numeric()
converter to do something like this:
x <- my_data_frame$Column.Title
xn <- as.numeric(x)
which(!is.na(xn))
This won't distinguish between NA
s created by failed coercion and pre-existing (numeric) NA
values.
If there's a small enough variety of "missing" values you could read the data in with read.csv(..., na.strings=c("NA","missing","no input"))