I have 2 csv files first.csv and second.csv. They both have a shared column.
Example: first : a b c d second: x y a z
I have to create a third csv file that looks like this: third : a b c d x y z
The files do not have the same number of entries, I must only merge the lines that share the same column. Also the paths of the 3 csv files must be sent as parameters.
I was trying to do this in Java but Python would also work!
I don't really know what I should do :(
If they will always have exactly 1 shared column and you want to merge the records (lines) that have the same value in that column, then the following code might help you:
import pandas as pd
def merge_csv_files(first_file_path, second_file_path, output_file_path):
first_df = pd.read_csv(first_file_path)
second_df = pd.read_csv(second_file_path)
shared_column = set(first_df.columns) & set(second_df.columns)
# look for exactly 1 shared column
if len(shared_column) != 1:
raise ValueError("The CSV files do not have exactly one shared column.")
shared_column = shared_column.pop()
merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')
merged_df.to_csv(output_file_path, index=False)
first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'
merge_csv_files(first_file_path, second_file_path, output_file_path)