pythonjavacsvopencsv

Merge 2 csv files by common column


I have 2 csv files first.csv and second.csv. They both have a shared column.

Example: first : a b c d second: x y a z

I have to create a third csv file that looks like this: third : a b c d x y z

The files do not have the same number of entries, I must only merge the lines that share the same column. Also the paths of the 3 csv files must be sent as parameters.

I was trying to do this in Java but Python would also work!

I don't really know what I should do :(


Solution

  • If they will always have exactly 1 shared column and you want to merge the records (lines) that have the same value in that column, then the following code might help you:

    import pandas as pd
    
    def merge_csv_files(first_file_path, second_file_path, output_file_path):
        first_df = pd.read_csv(first_file_path)
        second_df = pd.read_csv(second_file_path)
    
        shared_column = set(first_df.columns) & set(second_df.columns)
        # look for exactly 1 shared column
        if len(shared_column) != 1:
            raise ValueError("The CSV files do not have exactly one shared column.")
        shared_column = shared_column.pop()
    
        merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')
    
        merged_df.to_csv(output_file_path, index=False)
    
    first_file_path = 'first.csv'
    second_file_path = 'second.csv'
    output_file_path = 'third.csv'
    
    merge_csv_files(first_file_path, second_file_path, output_file_path)