pythonpandasloopscomparedata-files

Compare columns in Pandas between two unequal size Dataframes for condition check


I have two pandas DF. Of unequal sizes. For example :

Df1
id     value
a      2
b      3
c      22
d      5 

Df2 
id     value
c      22
a      2

No I want to extract from DF1 those rows which has the same id as in DF2. Now my first approach is to run 2 for loops, with something like :

x=[]
for i in range(len(DF2)):
    for j in range(len(DF1)):
        if DF2['id'][i] == DF1['id'][j]:
          x.append(DF1.iloc[j])    

Now this is okay, but for 2 files of 400,000 lines in one and 5,000 in another, I need an efficient Pythonic+Pnadas way


Solution

  • import pandas as pd
    
    data1={'id':['a','b','c','d'],
           'value':[2,3,22,5]}
    
    data2={'id':['c','a'],
           'value':[22,2]}
    
    df1=pd.DataFrame(data1)
    df2=pd.DataFrame(data2)
    finaldf=pd.concat([df1,df2],ignore_index=True)
    

    Output after concat

       id   value
    0   a   2
    1   b   3
    2   c   22
    3   d   5
    4   c   22
    5   a   2
    

    Final Ouput

    finaldf.drop_duplicates()
    
        id  value
    0   a   2
    1   b   3
    2   c   22
    3   d   5