I have two csv files, each with 13 columns.
The first column of each row contains a unique string. Some are duplicated in each file, some only exist in one file.
If the row exists in only one file I want to keep it in the new file.
If it exists in both I want to keep the one that has a certain value (or lacks a certain value) in a certain column of that same row.
For example:
file 1:
D600-DS-1991, name1, address1, date1
D601-DS-1991, name2, address2, date2
D601-DS-1992, name3, address3, date3
file 2:
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2
I would keep the first row of the first file because the fourth column contains date
instead of time
.
I would keep the second row of the first file since its first column, first row value is unique.
I would keep the second row of the second file as the third row of the new file because it contains text other than "name#" in the second column.
Should I first map all of the unique values to one another so that each file contains the same number of entries - even if some are blank or just have filler data?
I only know a little ruby and python... but I much prefer to solve this with a single Ruby file if at all possible since I will be able to understand the code better. If you can't do it in Ruby then please feel free to answer differently!
I'm not super happy with my solution but it works:
require 'csv'
def readcsv(filename)
csv = {}
CSV.foreach(filename) do |line|
csv[line[0]] = { name: line[1], address: line[2], date: line[3] }
end
csv
end
csv1 = readcsv('orders1.csv')
csv2 = readcsv('orders2.csv')
results = {}
csv1.each do |id, val|
unless csv2[id]
results[id] = val # checks to see if it only exists in 1 file
next
end
#see if name exists
if (val[:name] =~ /name/) && (csv2[id]) && (csv2[id][:name] =~ /name/).nil?
csv1.delete(id)
end
#missing some if statement regarding date vs. time
end
results = results.merge(csv2) # merge together whatever is remaining
CSV.open('newfile.csv', 'w') do |csv|
results.each do |key, val|
row = []
csv << (row.push(key, val.values)).flatten
end
end
Output of newfile.csv
:
D601-DS-1991, name2, address2, date2
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2