rdplyrgsubtemperature

How to remove everything except numeric elements in R


My apologies because there is certainly many similar questions and answers but I've tried a bunch of the suggested answers and sadly no dice.

I've got temperature data in three columns of a dataframe (tempdata). For simplicity I'm just trying to change one of these locations (wentworth.castle) at a time.

This is what my data looks like. All the columns with ".castle" in them are temperatures for that site. There are missing values but this is expected. Hoping to turn them into NA.

glimpse(tempdata)
Rows: 3,395
Columns: 5
$ Description      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", …
$ date.time        <chr> "22/11/2023 09:48", "22/11/2023 10:18", "22/11/2023 10:48", "22/11/2023 11:…
$ site.castle      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ dover.castle     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ wentworth.castle <chr> "9.484 \xb0C", "9.642 \xb0C", "9.768 \xb0C", "9.994 \xb0C", "10.066 \xb0C",…

I've tried the few things below and got the following errors.

tempdata$wentworth.castle <- gsub(" �C", "", as.numeric(tempdata$wentworth.castle))
#Error in is.factor(x) : invalid multibyte string at '<b0>C'

tempdata$wentworth.castle <- gsub(" \xb0C", "", as.numeric(tempdata$wentworth.castle))
#Error in is.factor(x) : invalid multibyte string at '<b0>C'

tempdata$wentworth.castle = tempdata$wentworth.castle.replace('\u00b0','', regex=True)
#Error: attempt to apply non-function

tempdata$wentworth.castle <- as.numeric(tempdata$wentworth.castle)
#Error: invalid multibyte string at '<b0>C'

I also tried a less robust way and attempted to create a function to remove things after a certain number of characters, however this is difficult because sometimes my data has 5 sig figures and sometimes 6 so even if it had worked I would have had some random spaces to remove from some of the entries.

left = function(string, chat){substr(string, 1, char)}
tempdata$wentworth.castle <- left(tempdata$wentworth.castle, 6)
#Error in as.integer(stop) : 
#  cannot coerce type 'closure' to vector of type 'integer'

Solution

  • This is an encoding issue not correctly interpreting the degree symbol, you can use iconv to convert then gsub to remove °C:

    # data 
    wentworth <- c("9.484 \xb0C", "9.642 \xb0C", "9.768 \xb0C", "9.994 \xb0C", "10.066 \xb0C")
    
    gsub(" °C","", iconv(wentworth, from = "ISO-8859-1", to = "UTF-8"))
    
    # [1] "9.484"  "9.642"  "9.768"  "9.994"  "10.066"
    
    # or if you want it numeric, just wrap it
    as.numeric(gsub(" °C","", iconv(wentworth, from = "ISO-8859-1", to = "UTF-8")))
    
    # [1]  9.484  9.642  9.768  9.994 10.066