How to convert all the Latin numbers (such as "xxv," "xxxv," "iii," and "ii") into numerical values in text data with R?
I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?
In addition, when I replace one by one, what if I have some words contains letters like "ii", "i"? For example, would the world "still" be changed into "st1ll"?
txt <- 'How to convert all the Latin numbers (such as "xxv," "xxxv," "iii," and "ii") into numerical values in text data with R?
I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?
In addition, when I replace one by one, what if I have some words contains letters like "ii", "i"? For example, would the world "still" be changed into "st1ll"?'
Get a vector of roman characters (note if you make this too large, the gregexpr
will throw an error, I didn't test to see what the limit is, however--it's somewhere between 1e2 and 1e3)
Exclude "I" since that is more likely not to be a numeral, then create your pattern and treat it like any other string find/replace:
rom <- as.character(as.roman(1:1e2))
rom <- setdiff(rom, 'I')
p <- sprintf('\\b(%s)\\b', paste0(na.omit(rom), collapse = '|'))
m <- gregexpr(p, txt, ignore.case = TRUE)
regmatches(txt, m) <- lapply(regmatches(txt, m), function(x) as.numeric(as.roman(x)))
cat(txt)
# How to convert all the Latin numbers (such as "25," "35," "3," and "2") into numerical values in text data with R?
#
# I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?
#
# In addition, when I replace one by one, what if I have some words contains letters like "2", "i"? For example, would the world "still" be changed into "st1ll"?
As a function:
dd <- data.frame(
texts = rep(txt, 5)
)
rom_to_num <- function(text, rom = 1:1e2, exclude = 'I') {
rom <- as.character(as.roman(rom))
rom <- setdiff(rom, exclude)
p <- sprintf('\\b(%s)\\b', paste0(na.omit(rom), collapse = '|'))
m <- gregexpr(p, text, ignore.case = TRUE)
regmatches(text, m) <- lapply(regmatches(text, m), function(x) as.numeric(as.roman(x)))
text
}
dd <- within(dd, {
texts_new <- rom_to_num(texts)
})