rdatabasecsvexport-to-csvduckdb

How to export a tbl_duckdb_connection object to CSV from duckdb without collect()?


I have a dataset (originally large CSV) that I filtered using duckdb and dbplyr.

This is a small script that get to my idea :

library(duckdb) 
library(DBI)
library(dplyr)
library(dbplyr)

path_data_csv = "data.csv"
data = duckdb::tbl_file(DBI::dbConnect(duckdb()), path_data_csv)
new_data = data |> 
  filter(datasetKey == '12345678', 
         kingdom %in% c('Animalia')) |> 
  collect()

Then since the data is in memory, I can export it using write.csv() :

write.csv(x = new_data, 
          file = 'newdata.csv', 
          row.names = FALSE)

But I was wondering if there is a way to export it directly into CSV without 'collecting' the data first. Essentially, I want to export the data of class c("tbl_duckdb_connection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl") to CSV without materialization. I wish to find something that is not requiring to use dbExecute or dbWriteTable (which would be ways to do it).


Solution

  • We can use dbplyr::sql_render() to generate the query and then construct a DuckDB copy statement to export the results to a CSV file.

    library(duckdb) 
    library(DBI)
    library(dplyr)
    library(dbplyr)
    
    path_data_csv = "data.csv"
    write.csv(iris, path_data_csv)  ## example dataset
    
    data = duckdb::tbl_file(DBI::dbConnect(duckdb()), path_data_csv)
    con <- DBI::dbConnect(duckdb())
    
    data %>% 
      dplyr::filter(Sepal.Length > 4.5, 
             Species %in% c('virginica')) %>% 
      dbplyr::sql_render() %>% 
      as.character() %>% 
      paste0("COPY (", ., ") TO 'newdata.csv' (FORMAT CSV, HEADER)") %>% 
      DBI::dbExecute(con, .)
    #> [1] 50
    

    Or as Jon mentioned in the comments, load/install {duckplyr} to take advantage of its compute_csv() function to execute the query and write the results directly to CSV; it also returns a duckplyr_df object for potential further operations. Note that you need to convert to "duckplyr_df" using duckplyr::as_duckdb_tibble() first.

    library(duckplyr)
    
    data %>%  
      dplyr::filter(Sepal.Length > 4.5, 
             Species %in% c('virginica')) %>% 
      duckplyr::as_duckdb_tibble() %>% 
      duckplyr::compute_csv("newdata.csv",
                            options = list(header = TRUE, delimiter = ","))
    
    #> # A duckplyr data frame: 6 variables
    #>    column0 Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
    #>      <dbl>        <dbl>       <dbl>        <dbl>       <dbl> <chr>    
    #>  1     101          6.3         3.3          6           2.5 virginica
    #>  2     102          5.8         2.7          5.1         1.9 virginica
    #>  3     103          7.1         3            5.9         2.1 virginica
    #>  4     104          6.3         2.9          5.6         1.8 virginica
    #>  5     105          6.5         3            5.8         2.2 virginica
    #>  6     106          7.6         3            6.6         2.1 virginica
    #>  7     107          4.9         2.5          4.5         1.7 virginica
    #>  8     108          7.3         2.9          6.3         1.8 virginica
    #>  9     109          6.7         2.5          5.8         1.8 virginica
    #> 10     110          7.2         3.6          6.1         2.5 virginica
    #> # ℹ more rows
    

    Created on 2025-09-14 with reprex v2.1.1