I have 3 dataframes having the same column names as each other. Say :
df1
column1 column2 column3
a b c
d e f
df2
column1 column2 column3
g h i
j k l
df3
column1 column2 column3
m n o
p q r
Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:
df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)
I also tried:
df_final = pd.concat([df1, df2, df3], axis=1)
But I get this error:
AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62
I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!
I think there is problem with duplicated columns names in some or all DataFrames.
#simulate error
df1.columns = ['column3','column1','column1']
df2.columns = ['column5','column1','column1']
df3.columns = ['column2','column1','column1']
df_final = pd.concat([df1, df2, df3])
AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5
You can find duplicated columns names:
print (df3.columns[df3.columns.duplicated(keep=False)])
Index(['column1', 'column1'], dtype='object')
Possible solutions is set columns names by list:
df3.columns = ['column1','column2','column3']
print (df3)
column1 column2 column3
0 m n o
1 p q r
Or remove duplicated columns with dupe names:
df31 = df3.loc[:, ~df3.columns.duplicated()]
print (df31)
column2 column1
0 m n
1 p q
Then concat
or append
should working nice.