I want to export my sales data to a CSV file in a rake task. The sales table has roughly 100 Million entries. How can I optimize this code? It seems to slow down the longer it runs.
sales = Sales.where(year: 2024)
BATCH_SIZE = 10_000
header_names = %w[user_id article_id amount currency date]
batch = []
File.open(filepath, 'w') do |file|
sales.find_each(batch_size: BATCH_SIZE)do |sale|
batch << sale.slice(*header_names).values.join(';')
if batch.size >= BATCH_SIZE
file.puts batch.join("\n")
batch.clear
end
end
file.puts batch.join("\n") unless batch.empty?
end
This takes me roughly 23min to export 17Mio. entries.
It's better to use raw SQL (PostgreSQL example) for that instead of Ruby
It's faster
Basic idea
require 'csv'
connection = ActiveRecord::Base.connection.raw_connection
attributes = %i[user_id article_id amount currency date]
data_query = Sales.select(*attributes).where(year: 2024).to_sql
sql_query = "COPY (#{data_query}) TO STDOUT WITH CSV"
File.open(filepath, 'w') do |file|
file.puts(attributes.to_csv)
connection.copy_data(sql_query) do
while (row = connection.get_copy_data)
file.puts(row)
end
end
end
In this case you don't initiate Active Record objects
Data is written directly to the file instead of building large in-memory arrays
For custom delimiter
COPY (#{data_query}) TO STDOUT WITH CSV DELIMITER ';'