pythonpandasdataframejoinmerge

Pandas: Merge by common column retaining all other columns and sort by common column


I have two DataFrames, both of which have an x column and a y column. The x values are unique within each DataFrame, but there is some overlap between the two.

I want to merge the two DataFrames, keeping only the x values that appear in both DataFrames and copying the y values from each DataFrame into its own column (y_1 or y_2, depending on the source).

df1 = pd.DataFrame({'x': ['1', '2', '3', '4', '5', '6'],
                    'y': ['A', 'B', 'C', 'D', 'E', 'F']})

df2 = pd.DataFrame({'x': ['3', '4', '10', '2', '12', '6', '8', '9'],
                    'y': ['A', 'G', 'H', 'E', 'F', 'J', 'L', 'B']})

   x  y
0  1  A
1  2  B
2  3  C
3  4  D
4  5  E
5  6  F

    x  y
0   3  A
1   4  G
2  10  H
3   2  E
4  12  F
5   6  J
6   8  L
7   9  B

The result I want is:

   x y_1 y_2
0  3   A   C
1  4   G   D
2  2   E   B
3  6   J   F

Solution

  • You can do it with pd.merge

    pd.merge(df2, df1, on='x', how='inner', suffixes=('_1', '_2'))