How do I merge two CSV's with nearly identical sets using rules for which data is kept? (Using Ruby & FasterCSV)

I have two csv files, each with 13 columns.

The first column of each row contains a unique string. Some are duplicated in each file, some only exist in one file.

If the row exists in only one file I want to keep it in the new file.

If it exists in both I want to keep the one that has a certain value (or lacks a certain value) in a certain column of that same row.

For example:

file 1:

D600-DS-1991, name1, address1, date1
D601-DS-1991, name2, address2, date2
D601-DS-1992, name3, address3, date3

file 2:

D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2

I would keep the first row of the first file because the fourth column contains date instead of time. I would keep the second row of the first file since its first column, first row value is unique. I would keep the second row of the second file as the third row of the new file because it contains text other than "name#" in the second column.

Should I first map all of the unique values to one another so that each file contains the same number of entries - even if some are blank or just have filler data?

I only know a little ruby and python... but I much prefer to solve this with a single Ruby file if at all possible since I will be able to understand the code better. If you can't do it in Ruby then please feel free to answer differently!

Solution

I'm not super happy with my solution but it works:

require 'csv'

def readcsv(filename)
  csv = {}
  CSV.foreach(filename) do |line|
    csv[line[0]] = { name: line[1], address: line[2], date: line[3] }
  end
  csv
end

csv1 = readcsv('orders1.csv')
csv2 = readcsv('orders2.csv')

results = {}
csv1.each do |id, val|
  unless csv2[id]
    results[id] = val # checks to see if it only exists in 1 file
    next
  end

  #see if name exists
  if (val[:name] =~ /name/) && (csv2[id]) && (csv2[id][:name] =~ /name/).nil?
    csv1.delete(id)
  end

  #missing some if statement regarding date vs. time
end

results = results.merge(csv2) # merge together whatever is remaining

CSV.open('newfile.csv', 'w') do |csv|
  results.each do |key, val|
    row = []
    csv << (row.push(key, val.values)).flatten
  end
end

Output of newfile.csv :

D601-DS-1991, name2, address2, date2
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2