[SOLVED] Join pandas dataframes based on column values

Join pandas dataframes based on column values

I'm quite new to pandas dataframes, and I'm experiencing some troubles joining two tables.

The first df has just 3 columns:

DF1:

item_id    position    document_id
336        1           10
337        2           10
338        3           10
1001       1           11
1002       2           11
1003       3           11
38         10          146

And the second has exactly same two columns (and plenty of others):

DF2:

item_id    document_id    col1    col2   col3    ...
337        10             ...     ...    ...
1002       11             ...     ...    ...
1003       11             ...     ...    ...

What I need is to perform an operation which, in SQL, would look as follows:

DF1 join DF2 on 
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id

And, as a result, I want to see DF2, complemented with column 'position':

item_id    document_id    position    col1   col2   col3   ...

What is a good way to do this using pandas?

Solution

I think you need merge with default inner join, but is necessary no duplicated combinations of values in both columns:

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

But if necessary position column in position 3:

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0