I have a csv, on windows:
name, age
Siân, 34
Brónagh, 45
François, 87
Alan, 23
,
I try to read this into R using:
library(arrow)
df <- read_csv_arrow("people.csv")
It loads the table but converts the name
column to arrow_binary
dput output:
structure(list(name = structure(list(as.raw(c(0x53, 0x69, 0xe2, 0x6e)),
as.raw(c(0x42, 0x72, 0xf3, 0x6e, 0x61, 0x67,0x68)),
as.raw(c(0x46, 0x72, 0x61, 0x6e, 0xe7, 0x6f, 0x69, 0x73)),
as.raw(c(0x41, 0x6c, 0x61, 0x6e)), NULL),
class = c("arrow_binary", "vctrs_vctr", "list"))),
row.names = c(NA, -5L), class = c("tbl_df","tbl", "data.frame"))
I've tried to do an explicity conversion of this column:
as.character(df$name)
> Can't convert `x` <arrow_binary> to <character>.
I've also tried to use arrows
cast
command following this
df %>% mutate(name = arrow::cast(name, string()))
But it can't find cast
> ! 'cast' is not an exported object from 'namespace:arrow'
Additionally, I've tried defining the datatype in the read_csv_arrow
read_csv_arrow("people.csv",
col_types = schema(name = arrow::string()))
but this gives:
> ! Invalid: In CSV column #1: Row #1: CSV conversion error to string: invalid UTF8 data
I would like to use uft16, but it doesn't appear to be a datatype that arrow accepts
From the comments:
read_csv_arrow("people.csv", read_options = list(encoding = "latin1"))