left_df
and right_df
:pd.DataFrame({'Column1': ['dummy_text1'],
'Column2': [''],
'Column3': [''],
'Column4': ['dummy_text2'],
'Column5': [''],
'Column6': ['Dummy Role 1']
})
left_df.merge(right_df,
on=['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'],
how='outer',
indicator=True)
the result contains two rows, one as 'left_only' and one as 'right_only', instead of a single row with 'both' as indicator.
I have already tried the following steps to address the issue:
drop_duplicates()
with inplace=True
and keep=False
on both DataFrames before the merge.Edit:
Sharing code snippet: I forgot to mention this part, this is to resolve case of the values:
left_df['Column1_U'] = left_df['Column1'].copy()
left_df['Column6_U'] = left_df['Column6'].copy()
right_df['Column1_U'] = right_df['Column1'].copy()
right_df['Column6_U'] = right_df['Column6'].copy()
this is the rest of the code:
columns_to_exclude = ['Column1_U', 'Column6_U']
left_df = left_df.apply(lambda s: s.str.lower() if s.name not in columns_to_exclude else s)
right_df = right_df.apply(lambda s: s.str.lower() if s.name not in columns_to_exclude else s)
merged_df = merge(left = left_df
,right = right_df
,on=['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6']
,how='outer'
,indicator=True)
but the output that i'm getting is this: Column1,Column2,Column3,Column4,Column5,Column6,Column1_U_x,Column6_U_x,Column1_U_y,Column6_U_y,_merge dummy_text1,,,dummy_text1,,dummy role 1,Dummy_text1,Dummy Role 1,,,left_only dummy_text1,,,dummy_text1,,dummy role 1,,,Dummy_text1,Dummy Role 1,right_only
It is solved. The issue turned out to be with NULL
value and ''
, both these dataframes were getting created from SQL tables and in the query I was using ISNULL(Column, '')
as Column in one and NULL
AS Column in the other. Turns out after doing dataframe.fillna('')
, it was treating the '' differently between both dataframes, after I updated the queries the issue was resolved.