rubyfastercsv

merge CSV files on a common field with ruby/fastercsv


I have a 'master' file with a number of columns: 1 2 3 4 5. I have a few other files, with fewer rows than the master file, each with columns: 1 6. I'd like to merge these files matching on the column 1 field and add column 6 to the master. I've seen some python/UNIX solutions but would prefer to use ruby/fastercsv if it's a good fit. I would appreciate any help getting started.


Solution

  • FasterCSV is now the default CSV implementation in Ruby 1.9. This code is untested, but should work.

    require 'csv'
    master = CSV.read('master.csv') # Reads in master
    master.each {|each| each.push('')} # Adds another column to all rows
    Dir.glob('*.csv').each do |each| #Goes thru all csv files
      next if each == 'master.csv' # skips the master csv file
      file = CSV.read(each) # Reads in each one
      file.each do |line| #Goes thru each line of the file
        temp = master.assoc(line[0]) # Finds the appropriate line in master
        temp[-1] = line[1] if temp #updates last column if line is found
      end
    end
    
    csv = CSV.open('output.csv','wb') #opens output csv file for writing
    master.each {|each| csv << each} #Goes thru modified master and saves it to file