rubyio

most efficient way to write data into a file


I want to write 2TB data into one file, in the future it might be a petabyte.

The data is composed of all '1'. For example, 2TB data consisting of "1111111111111......11111" (each byte is represented by '1').

Following is my way:

File.open("data",File::RDWR||File::CREAT) do |file|
  2*1024*1024*1024*1024.times do
  file.write('1')
  end
end

That means, File.write is called 2TB times. From the point of Ruby, is there a better way to implement it?


Solution

  • You have a few problems:

    1. File::RDWR||File::CREAT always evaluates to File::RDWR. You mean File::RDWR|File::CREAT (| rather than ||).

    2. 2*1024*1024*1024*1024.times do runs the loop 1024 times then multiplies the result of the loop by the stuff on the left. You mean (2*1024*1024*1024*1024).times do.

    Regarding your question, I get significant speedup by writing 1024 bytes at a time:

    File.open("data",File::RDWR|File::CREAT) do |file|
      buf = "1" * 1024
      (2*1024*1024*1024).times do
        file.write(buf)
      end
    end
    

    You might experiment and find a better buffer size than 1024.