I have financial data where I need to save / find rows that have multiple same value and a condition where the same value happened more than / = 2 and not (value)equal to 0 or < 1.
Say I have this:
A B C D E F G H I
5/7/2025 21:00 0 0 0 0 0 0 0 0
5/7/2025 21:15 0 0 19598.8 0 19598.8 0 0 0
5/7/2025 21:30 0 0 0 0 0 0 0 0
5/7/2025 21:45 0 0 0 19823.35 0 0 0 0
5/7/2025 22:00 0 0 0 0 0 0 0 0
5/7/2025 22:15 0 0 0 0 0 0 0 0
5/7/2025 22:30 0 0 0 19975.95 0 19975.95 0 19975.95
5/7/2025 23:45 0 0 0 0 0 0 0 0
5/8/2025 1:00 0 0 19830.2 0 0 0 0 0
5/8/2025 1:15 0 0 0 0 0 0 0 0
5/8/2025 1:30 0 0 0 0 0 0 0 0
5/8/2025 1:45 0 0 0 0 0 0 0 0
I want this along with other datas in those rows:
A B C D E F G H I
5/7/2025 21:15 0 0 19598.8 0 19598.8 0 0 0
5/7/2025 22:30 0 0 0 19975.95 0 19975.95 0 19975.95
A simple approach could be to select the columns of interest, then identify if any value is duplicated
within a row. Then select the matching rows with boolean indexing:
mask = df.loc[:, 'B':].T
out = df[mask.apply(lambda x: x.duplicated(keep=False)).where(mask >= 1).any()]
A potentially more efficient approach could be to use numpy. Select the values, mask the values below 1, sort
them and identify if any 2 are identical in a row with diff
+ isclose
:
mask = df.loc[:, 'B':].where(lambda x: x>=1).values
mask.sort()
out = df[np.isclose(np.diff(mask), 0).any(axis=1)]
Output:
A B C D E F G H I
1 5/7/2025 21:15 0 0 19598.8 0.00 19598.8 0.00 0 0.00
6 5/7/2025 22:30 0 0 0.0 19975.95 0.0 19975.95 0 19975.95