I am trying to bring in multiple .csv datasets with accounting data into a single large dataset using vroom
. The individual files often vary in their composition, so I need to assign column types that force NA
to be numeric when applicable. But, when I attempt to assign col_types()
, I am unable to get a desirable output.
Name | Value |
---|---|
A | (10,000.00) |
B | -10,000.00 |
B | 10,000.00 |
col_number()
vroom("example.csv",
delim = ",",
col_types = cols(Name = col_character(),
Value = col_number()))
Results
# A tibble: 3 × 2
Name Value
<chr> <dbl>
1 A 10000
2 B -10000
3 C 10000
So, col_number()
is not evaluating the parentheses as a negative value.
col_double()
vroom("example.csv",
delim = ",",
col_types = cols(Name = col_character(),
Value = col_double()))
Results
# A tibble: 3 × 2
Name Value
<chr> <dbl>
1 A NA
2 B NA
3 C NA
So, col_double()
has an issue the big mark formatting comma (,).
Is there an option to navigate around this issue?
The data starts out in a .xls file from the source, using parentheses to denote negatives and even after saving to a .csv file where it changes the view to a negative sign instead, reading into vroom
seems to take their formatting as still having the parentheses.
My previous solution was very bloated reading in the .csv files with read_csv
and then doing manual manipulation with gsub
to remove the issues caused by the big mark & parentheses.
I'm curious if there's a simpler way, but one way would be to bring in as character, convert (
to -
, and then convert to numeric.
df <- vroom::vroom(file = I("Name,Val\nA,(10)\nB,-10\nB,10"),
delim = ",",
col_types = cols(Name = col_character(),
Val = col_character()))
df
# A tibble: 3 × 2
Name Val
<chr> <chr>
1 A (10)
2 B -10
3 B 10
df$Val = readr::parse_number(gsub("\\(", "-", df$Val))
df
Result
Name Val
<chr> <dbl>
1 A -10
2 B -10
3 B 10