rvroom

Vroom converts negative accounting values positive in R


I am trying to bring in multiple .csv datasets with accounting data into a single large dataset using vroom. The individual files often vary in their composition, so I need to assign column types that force NA to be numeric when applicable. But, when I attempt to assign col_types(), I am unable to get a desirable output.

example.csv

Name Value
A (10,000.00)
B -10,000.00
B 10,000.00

Using col_number()

vroom("example.csv", 
      delim = ",",
      col_types = cols(Name = col_character(),
                       Value = col_number()))

Results

# A tibble: 3 × 2
  Name   Value
  <chr>  <dbl>
1 A      10000
2 B     -10000
3 C      10000

So, col_number() is not evaluating the parentheses as a negative value.

Using col_double()

vroom("example.csv", 
      delim = ",",
      col_types = cols(Name = col_character(),
                       Value = col_double()))

Results

# A tibble: 3 × 2 
  Name   Value
  <chr>  <dbl>
1 A      NA
2 B      NA
3 C      NA

So, col_double() has an issue the big mark formatting comma (,).

Is there an option to navigate around this issue? The data starts out in a .xls file from the source, using parentheses to denote negatives and even after saving to a .csv file where it changes the view to a negative sign instead, reading into vroom seems to take their formatting as still having the parentheses.

My previous solution was very bloated reading in the .csv files with read_csv and then doing manual manipulation with gsub to remove the issues caused by the big mark & parentheses.


Solution

  • I'm curious if there's a simpler way, but one way would be to bring in as character, convert ( to -, and then convert to numeric.

    df <- vroom::vroom(file = I("Name,Val\nA,(10)\nB,-10\nB,10"),
                 delim = ",",
                 col_types = cols(Name = col_character(),
                                  Val = col_character()))
    df 
    # A tibble: 3 × 2
      Name  Val  
      <chr> <chr>
    1 A     (10) 
    2 B     -10  
    3 B     10
    
    
    df$Val = readr::parse_number(gsub("\\(", "-", df$Val))
    df     
    

    Result

      Name    Val
      <chr> <dbl>
    1 A       -10
    2 B       -10
    3 B        10