[SOLVED] How to filter rows in a DataFrame based on a column condition

How to filter rows in a DataFrame based on a column condition

I have a large dataset in a pandas DataFrame that looks like this:

HR	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

I want to create a new DataFrame that includes only the rows where SepsisLabel is 1, and therefore excludes rows with PatientIDs that do not have any SepsisLabel of 1.

For example, PatientIDs 0, 2, and 4 do not have any SepsisLabel of 1, so they should not appear in the new DataFrame. Only PatientIDs 1 and 3, which have at least one row with SepsisLabel of 1, should be included.

How can I achieve this in Python using pandas?

Here is my initial attempt, but I am not sure if it is correct or optimal:

# My initial attempt

filtered_df = df[df['SepsisLabel'] == 1]
result_df = df[df['PatientID'].isin(filtered_df['PatientID'].unique())]

Any guidance or improvements would be greatly appreciated.

Solution

Use GroupBy.transform with GroupBy.any for test if at least one True per groups and filtering by boolean indexing:

df1 = df[df['SepsisLabel'].eq(1).groupby(df['PatientID']).transform('any')]

Or filter all groups with 1 and filter them in Series.isin:

df1 = df[df['PatientID'].isin(df.loc[df['SepsisLabel'].eq(1), 'PatientID'])]

If small data or performance not important is possible use DataFrameGroupBy.filter:

df1 = df.groupby('PatientID').filter(lambda x: x['SepsisLabel'].eq(1).any())

print (df1)
     HR  SBP  DBP  SepsisLabel  PatientID
3    95  130   90            0          1
4   102  120   80            1          1
5   109  115   75            1          1
9    88  115   75            0          3
10   93  125   85            1          3
11   78  130   90            1          3

HR	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

HR	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

HR	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4