ruby-on-railsrubyexport-to-csv

Rails: How to EXport 100 Million records to CSV


I want to export my sales data to a CSV file in a rake task. The sales table has roughly 100 Million entries. How can I optimize this code? It seems to slow down the longer it runs.

sales = Sales.where(year: 2024)
BATCH_SIZE = 10_000
header_names = %w[user_id article_id amount currency date]
batch = []
File.open(filepath, 'w') do |file|
  sales.find_each(batch_size: BATCH_SIZE)do |sale|
    batch << sale.slice(*header_names).values.join(';')
    if batch.size >= BATCH_SIZE

      file.puts batch.join("\n")
      batch.clear
    end
  end
  file.puts batch.join("\n") unless batch.empty?
end

This takes me roughly 23min to export 17Mio. entries.


Solution

  • It's better to use raw SQL (PostgreSQL example) for that instead of Ruby

    It's faster

    Basic idea

    require 'csv'
    
    connection = ActiveRecord::Base.connection.raw_connection
    attributes = %i[user_id article_id amount currency date]
    data_query = Sales.select(*attributes).where(year: 2024).to_sql
    sql_query = "COPY (#{data_query}) TO STDOUT WITH CSV"
    
    File.open(filepath, 'w') do |file|
      file.puts(attributes.to_csv)
    
      connection.copy_data(sql_query) do
        while (row = connection.get_copy_data)
          file.puts(row)
        end
      end
    end
    

    In this case you don't initiate Active Record objects

    Data is written directly to the file instead of building large in-memory arrays

    For custom delimiter

    COPY (#{data_query}) TO STDOUT WITH CSV DELIMITER ';'