rdbitrailing

Automatically strip trailing whitespace when fetching data with `DBI::dbGetQuery()` in R?


I work with a database (of which I am not the DBA) that has character columns of length greater than the actual data.

Is it possible to automatically strip trailing whitespace when fetching data with DBI::dbGetQuery()? (i.e. something similar to utils::read.table(*, strip.white = TRUE))

# connect
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# generate fake data
mytable <- data.frame(x = 1, y = LETTERS[1:3], z = paste(LETTERS[1:3], "   "))
dbWriteTable(con, "mytable", mytable)

# fetch data
(a <- dbGetQuery(con, "select * from mytable"))
#   x y     z
# 1 1 A A    
# 2 1 B B    
# 3 1 C C    

# trailing space are kept
sapply(a, nchar)
#      x y z
# [1,] 1 1 5
# [2,] 1 1 5
# [3,] 1 1 5

I hope I can avoid something like:

idx <- sapply(a, is.character)
a[idx] <- lapply(a[idx], trimws, which = "left", whitespace = "[ ]")
sapply(a, nchar)
#      x y z
# [1,] 1 1 1
# [2,] 1 1 1
# [3,] 1 1 1

If not, is it a good approach?


Solution

  • As long as you're using select *, there is nothing SQL is going to do for this. If you select them by-name (which is a "best practice" and in many areas the industry-standard), you can use TRIM:

    sqldf::sqldf("select x, y, trim(z) as z from mytable") |>
      str()
    # 'data.frame': 3 obs. of  3 variables:
    #  $ x: num  1 1 1
    #  $ y: chr  "A" "B" "C"
    #  $ z: chr  "A" "B" "C"
    

    There are also rtrim and ltrim for limiting which side of the string you trim trailing/leading blank space.