Is there any potential downside to using the following code to create a new data frame, wherein I'm specifying very specific information from the original data frame I want to see in the new data frame.
df_workloc = (df[df['WorkLoc'] == 'Home'][df['CareerSat'] == 'Very satisfied'][df['CurrencySymbol'] == 'USD'][df['CompTotal'] >= 50000])
I used the 2019 Stack Overflow survey data. As such:
WorkLoc specifies where a respondent works.
CareerSat specifies a respondent's career satisfaction.
CurrencySymbol specifies what currency a respondent gets paid in.
CompTotal specifies what a respondent's total compensation is.
If anyone has a cleaner, more efficient way of achieving a data frame with refined / specific information I'd love to see it. One thing I'd like to do is specify a Compensation total CompTotal of >= 50000 and <=75000 in the same line. However, I get an error when I tried to include the second boolean.
Thanks in advance.
I think you need chain conditions with & for bitwise AND
and filter by boolean indexing
, also for last condition use Series.between
:
m1 = df['WorkLoc'] == 'Home'
m2 = df['CareerSat'] == 'Very satisfied'
m3 = df['CurrencySymbol'] == 'USD'
m4 = df['CompTotal'].between(50000, 75000)
df_workloc = df[m1 & m2 & m3 & m4]
Or for one line solution:
df_workloc = df[(df['WorkLoc'] == 'Home') &
(df['CareerSat'] == 'Very satisfied') &
(df['CurrencySymbol'] == 'USD') &
df['CompTotal'].between(50000, 75000)]