I have an excel file with multiple sheets containing when and where employees went for a sale. The columns are different in all sheets. Example
Sheet 1
Date/Place PlaceA PlaceB PlaceD PlaceE PlaceF PlaceG
2019-03-01 A B C D E F
2019-03-02 A B C D E F
2019-03-03 A B C D E F
2019-03-08 A B C D E F
2019-03-05 A B C T E F
2019-03-06 G B K D N Y
Sheet 2
Date/Place PlaceE PlaceF PlaceG PlaceH PlaceI
2019-03-05 T E U A B
2019-03-06 X P Y N H
I need to find the duplicates, corresponding to the Date and Place and replace it with an @ symbol using python
The output will be an excel will the same sheets
Sheet 1
Date/Place PlaceA PlaceB PlaceD PlaceE PlaceF PlaceG
2019-03-01 A B C D E F
2019-03-02 A B C D E F
2019-03-03 A B C D E F
2019-03-08 A B C D E F
2019-03-05 A B C T E F
2019-03-06 G B K D N Y
Sheet 2
Date/Place PlaceE PlaceF PlaceG PlaceH PlaceI
2019-03-05 @ @ U A B
2019-03-06 X P @ N H
The data - T was in PlaceE on 2019-03-05 is seen in both Sheet1 and Sheet2, hence in Sheet2 it is replaced with @. There are multiple sheets in the file and the duplicates needs to checked across all sheets. Thanks in advance for the answer!
I tried using the df.duplicated function from pandas, but it checks only the same sheet and also, the since this is not direct row duplication, the duplication is based on the first column, it did not work out
Assuming df1
/df2
, you can align df1
on df2
and mask the cells that are different, except in Date/Place
:
tmp = df2.set_index('Date/Place')
out = tmp.mask(tmp.eq(df1.set_index('Date/Place')), '@').reset_index()
Or:
out = df2.mask(
df2[['Date/Place']].merge(df1, how='left')
.set_axis(df2.index)
.eq(df2).assign(**{'Date/Place': False}),
'@'
)
Output:
Date/Place PlaceE PlaceF PlaceG PlaceH PlaceI
0 2019-03-05 @ @ U A B
1 2019-03-06 X P @ N H