pythonpandasdata-analysisdata-filtering

updating and adding columns values from one dataframe to another on the basis of specific conditions


I have two data frames df_1 and df_2 with an uneven number of rows and columns in both. My df_1 looks like this

             ID        Date         QN          RWS    RS           WI  eor
0          00603    202206200000     2           0     0.0           0  eor
1          00603    202206200010     2           0     0.0           0  eor
2          00603    202206200020     2           0     0.0           0  eor
..           ...           ...   ...         ...     ...         ...  ...
230007       19140  202206202340     2           0     0.0           0  eor
230008       19171  202206202350     2           0     0.0           0  eor

and df_2 looks like this

           ID      Start_Date End_Date          Station  Latitude   Longitude          
0          00020   20040813   20220619            432     48.9219    9.9129
1          00040   20070209   20220620             44     47.9219    8.0129          
2          00603   20070209   20220620             45     46.8319    7.5129    
..           ...        ...        ...            ...  ...                 ...    
950        15813   20220301   20220619             40     48.9319    8.9129     
951        19140   20201101   20220619            278     47.5310    9.0124           
952        19171   20200901   20220619             13     49.9013    7.0809 

I do not have the latitude, longitude, Start_Date, and End_Date columns in the df_1, therefore, I want to add these columns on the basis of the ID values i.e. a row in df_1 having an ID number as the same in df_2 must have the exact same latitude, longitude, start_date and end_date of that ID as in df_2.

How can I achieve this? by using for loop with row iteration?

I have this logic in my mind but I cannot perform this through coding.

if 'ID' in df_1 == 'ID' in df_2:
 then add latitude, longitude, start_date, and end_date columns to the df_1 from df_2.

Any help will be highly appreciated. Thanks


Solution

  • You can use DataFrame.join() like this:

    df_1 = df_1.join(df_2.set_index('ID')[['Start_date','End_date','Latitude','Longitude']], on='ID')
    

    Test code:

    df_1 = pd.DataFrame({'ID':['00603','00603','00603'],'col_from_df1':[2,2,2]})
    df_2 = pd.DataFrame({'ID':['00020','00040','00603'],'Start_date':[20040813,20070209,20070209],'End_date':[20220619,20220620,20220620],'Station':[432,44,45],
        'Latitude':[48.9219,47.9219,46.8319],'Longitude':[9.9129,8.0129,7.5129]})
    print(df_1)
    print(df_2)
    df_1 = df_1.join(df_2.set_index('ID')[['Start_date','End_date','Latitude','Longitude']], on='ID')
    print(df_1)
    

    Input:

    df_1:
          ID  col_from_df1
    0  00603             2
    1  00603             2
    2  00603             2
    
    df_2:
          ID  Start_date  End_date  Station  Latitude  Longitude
    0  00020    20040813  20220619      432   48.9219     9.9129
    1  00040    20070209  20220620       44   47.9219     8.0129
    2  00603    20070209  20220620       45   46.8319     7.5129
    

    Output:

          ID  col_from_df1  Start_date  End_date  Latitude  Longitude
    0  00603             2    20070209  20220620   46.8319     7.5129
    1  00603             2    20070209  20220620   46.8319     7.5129
    2  00603             2    20070209  20220620   46.8319     7.5129