rdata-analysisexploratory-data-analysis

How to clean a column with mixed variables like curreny in char format, blanks, strings in R


I have a dataset of startups, in which I have a column called "Amount" which is nothing but the valuations of each startup. when I tried to plot, the plot came out ugly and I found that those were in "char" format but when I tried to see the values in a column using table(copy$Amount) it showed all values mixed to each other. u can see the example pics here:

1 2

I'm a beginner to R, I tried several small codes but nothing worked. I want to remove the "string rows", "blank row", and "empty $ symbol row without number" and convert the remaining rows into numbers.


Solution

  • You can use parse_number from the readr package, which:

    ...drops any non-numeric characters before or after the first number. The grouping mark specified by the locale is ignored inside the number.

    For example:

    > x <- c("1,000", "$1,000", "$$1,000", "1,000$")
    > readr::parse_number(x)
    [1] 1000 1000 1000 1000