I'm trying to replace currency symbols in my corpus to text such as $ to dollar. For example:
x <- "i have \u20AC and \u0024 and \u00A3 and \u00A5 and \u20B9"
"i have € and $ and £ and ¥ and \u20b9"
Unicode works well for all the currency except the rupee. So what would be the problem?
My second issue is while doing a gsub, Unicode replacement works for every symbol except for dollar.
sub('\u0024'dollar', x) ## which gives me
"i have € and $ and £ and ¥ and \u20b9dollar"
Replacing dollar could be done using this:
gsub([$], dollar, x)
To view your x
with the rupee in it, use cat
:
> cat(x, sep="\n")
i have € and $ and £ and ¥ and ₹
>
To replace the dollar, use a literal string replacement by adding fixed=TRUE
(so as not to escape the $
symbol that denotes the end of string in a regex):
> x <- gsub("$", "dollar", x, fixed=TRUE)
> cat(x, sep="\n")
i have € and dollar and £ and ¥ and ₹
>
When you do not pass fixed=TRUE
, sub
and gsub
parses the "$"
as a regex pattern, and in regex, $
denotes the end of string. That is why in your results, dollar
is added after the rupee.