pythonpandasconcatenation

Pandas Python: Concatenate dataframes having same columns


I have 3 dataframes having the same column names as each other. Say :

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l


df3
column1   column2   column3
m         n         o
p         q         r

Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:

df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)

I also tried: df_final = pd.concat([df1, df2, df3], axis=1)

But I get this error: AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62

I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!


Solution

  • I think there is problem with duplicated columns names in some or all DataFrames.

    #simulate error
    df1.columns = ['column3','column1','column1']
    df2.columns = ['column5','column1','column1']
    df3.columns = ['column2','column1','column1']
    
    df_final = pd.concat([df1, df2, df3])
    

    AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5

    You can find duplicated columns names:

    print (df3.columns[df3.columns.duplicated(keep=False)])
    Index(['column1', 'column1'], dtype='object')
    

    Possible solutions is set columns names by list:

    df3.columns = ['column1','column2','column3']
    print (df3)
      column1 column2 column3
    0       m       n       o
    1       p       q       r
    

    Or remove duplicated columns with dupe names:

    df31 = df3.loc[:, ~df3.columns.duplicated()]
    print (df31)
      column2 column1
    0       m       n
    1       p       q
    

    Then concat or append should working nice.