rsqliter-dbidbplyr

How to ignore delimiters inside quoted strings when importing a csv file with RSQLite?


I want to import a csv file that has a similar structure with the example below:

var1;var2;var3
"a";1;"Some text"
"b";0;"More text"
"c;0;"Delimiter in ; middle of the text"

Traditional parsers such as the one used by data.table::fread deal with that by default. I want to import this data to a SQLite database with RSQLite::dbWriteTable.

con <- DBI::dbConnect(RSQLite::SQLite(), dbname = "mydb.sqlite")
dbWriteTable(conn = con, name = "my_table", value = "data_file.csv")

There is no option in dbWriteTable to provide quotes and thus the function throws an error when the problematic line is found. How could I import this data? The only constraint I have is that I don't have enough memory to parse the data with R before importing into SQLite.


Solution

  • Install the csvfix utility which is available on Windows and Linux platforms and then try this test code. It worked for me on Windows. You may need to adjust it slightly for other platforms, particularly the shell line and the eol= argument which you may not need or you may need a different value. We use csvfix to remove the quotes and replace the semicolons that are not in fields with @. Then we use the @ separator when reading it in.

    First we create the test data.

    # if (file.exists("mydb")) file.remove("mydb")
    # if (file.exists("data_file2.csv")) file.remove("data_file2.csv")
    
    # write out test file
    cat('var1;var2;var3\n"a";1;"Some text"\n"b";0;"More text"\n"c";0;"Delimiter in ; middle of the text"', file = "data_file.csv")
    
    # create database (can omit if it exists)
    cat(file = "mydb")
    

    csvfix

    Now process data file with csvfix

    library(RSQLite)
    
    # preprocess file using csvfix - modify next line as needed depending on platform
    shell("csvfix write_dsv -sep ; -s @ data_file.csv > data_file2.csv")
    file.show("data_file2.csv") # omit this line for real data
    
    # write file to database
    con <- dbConnect(SQLite(), "mydb")
    dbWriteTable(con, "myFile", "data_file2.csv", sep = "@", eol = "\r\n")
    dbGetQuery(con, "select * from myFile") # omit this line for real data
    dbDisconnect(con)
    

    xsv

    Alternately install the xsv (releases) rust utility. This worked for me on Windows.

    library(RSQLite)
    
    shell("xsv fmt -d ; -t @ data_file.csv > data_file2.csv")
    file.show("data_file2.csv") # omit this line for real data
    
    # write file to database
    con <- dbConnect(SQLite(), "mydb")
    dbWriteTable(con, "myFile", "data_file2.csv", sep = "@")
    dbGetQuery(con, "select * from myFile") # omit this line for real data
    dbDisconnect(con)