I have two large .csv files that I would like to join.
file1.csv has the following structure:
productcode; *many useless columns* ; startdate; enddate; *some other useless columns*
file2.csv has the following structure:
productcode; *many useless columns different from file1* ; page; startdate; enddate; *some othe useless columns*
I would like to join the two files into a file (let's say, out.csv
) with the same structure as file1.csv but with the "page" column from file2.csv, i.e.
productcode; *useless columns* ; page; startdate; enddate; *useless columns*
The join conditions are same productcode and overlapping dates, i.e.:
file1.productcode == file2.productcode
and
!(file1.endate<file2.startdate or file2.enddate<file1.startdate)
However, I have no idea on how to do that. One possibility could be to export the two CSVs into MySql, process them and then export the result in a final CSV file. However, that takes time (and is resource consuming).
I'm open to any suggestions.
Load them with pandas and use the function .join() to join both with the column reference you need