I need the column names from the dataframe that contain both the term software and packages.
I'm able to filter out columns containing one string.. for eg:
software_cols = df.filter(regex='Software|software|SOFTWARE').columns
How do I achieve the same by mentioning 'Packages/packages/PACKAGES' as well.
Eligible column names should be like 'Local Software Packages', 'Software XYZ Packages', 'Software Package'
Keep things simple as you don't need a regex here, just use two boolean masks and a case independent comparison:
# does the column name contain "software"?
m1 = df.columns.str.contains('software', case=False)
# does it contain "package"?
m2 = df.columns.str.contains('package', case=False)
# if both conditions are met, keep the column
out = df.loc[:, m1&m2]
Example input:
df = pd.DataFrame(columns=['Local Software Packages', 'Software XYZ Packages', 'Software Package', 'Other', 'Software only'], index=[0])
Output:
Local Software Packages Software XYZ Packages Software Package
0 NaN NaN NaN
If you just want the names:
df.columns[m1&m2]
# Index(['Local Software Packages', 'Software XYZ Packages', 'Software Package'], dtype='object')