pythonpandasdataframeindexingquerying

Getting indeces of rows given query of header names and respective values in Pandas


I am given a dataframe, a subset of headers and values of those columns. I am interested in finding the indices of the dataframe that contain the values of interest in the columns of interest without explicitly typing in the column name and value; i.e. using df.index[df['BoolCol'] == VALUE and df['BoolCol2' == VALUE2] as I wont know what the header and values will be, and they will change every so often. I'm not sure how to do this when you can't explicitly in the code type the column names and values, and simply using variables that contain the lists of headers, and list of values.

Code Summary/Example:

df:
    Pretreat  Setup
0        3.0    0.5
1        3.0    0.5
2        3.0    3.0
3        3.0    3.0
4        3.0    5.0
5        3.0    5.0
6        3.0    0.5
7        3.0    0.5

query_labels = ['Pretreat', 'Setup'] #querying against 2 columns, Pretreat and Setup
query_values = [(3.0, 0.5)] #looking for indeces where Pretreat == 3.0 and Setup == 0.5 (in order of query_labels)

#Expecting:
{(3.0, 0.5): [0, 1, 6, 7]}


Solution

  • You can convert to series and check equqlity of all columns:

    s = pd.Series(query_values[0], index=query_labels)
    
    df[df.eq(s).all(1)].index
    

    Output:

    Int64Index([0, 1, 6, 7], dtype='int64')
    

    If there are many items in query_labels:

    out = {k: df[df.eq(pd.Series(k, index=query_labels)).all(1)].index.to_list() 
           for k in query_values}
    

    Output: {(3.0, 0.5): [0, 1, 6, 7]}