I have a data frame with a column that contains observations that mix characters (words) and roman numbers. It also has integers, only characters (like the observation "Apple"), and NA's, but I want to leave them unchanged.
So it has observations like:
x <- data.frame(col = c("15", "NA", "0", "Red", "iv", "Logic", "ix. Sweet", "VIII - Apple",
"Big XVI", "WeirdVII", "XI: Small"))
What I want is to turn every observation that has a roman number (even the ones that are mixed with words), and turn them into integers. So, following the example, the resulting data frame would be like:
15 NA 0 Red 4 Logic 9 8 16 7 11
Is there any way to do this?
What I have attempted is:
library(stringr)
library(gtools)
roman <- str_extract(x$col, "([IVXivx]+)")
roman_to_int <- roman2int(roman)
x$col <- ifelse(!is.na(roman_to_int), roman_to_int, x$col)
However, this does not work because the observations that are character but do not include roman integers are also turned into roman numbers, like the one "Logic" turns as "1". I want to avoid this.
pat <- "[IVXLCDM]{2,}|\\b[ivxlcdm]+\\b|\\b[IVXLCDM]+\\b"
str_replace_all(x$col,pat, gtools::roman2int)
[1] "15" "NA" "0" "Red" "4"
[6] "Logic" "9. Sweet" "8 - Apple" "Big 16" "Weird7"
[11] "11: Small"