pandasnumpypytableshdfstorenumexpr

Numexpr/PyTables: how to pass perform multiple condition queries from a list/array?


I am doing a query (i.e. selecting multiple rows based on some condition) using PyTables, with the functions tables.Table.read() and tables.Table.read_where(). This is essentially based on numpy and pandas with NumExpr:

http://www.pytables.org/usersguide/tutorials.html http://www.pytables.org/cookbook/hints_for_sql_users.html https://github.com/PyTables/PyTables/blob/6782047b9223897fd59ff4967d71d7fdfb474f16/tables/table.py

In the "hints for sql users", the example for selecting multiple rows at once is this:

rows = tbl.read_where('(sqrt(x**2 + y**2) <= 1) & (temperature < 100)')

Let's say I would prefer to do as query as follows: all rows that equal temperature 100 or equal temperature 90

rows = tbl.read_where('(temperature == 100) | (temperature == 90)')

This works perfectly. But I would like to pass through a list/array of "temperature values" to accomplish this task.

temperatures = [80, 90, 100]
# reads in temperatures
# performs this query: 
rows = tbl.read_where('(temperature == 80) | (temperature == 90) | (temperature == 100)')

Is this possible? The idea would be I would write a function whereby users input a list of values to query, and it performs an OR query for each.


Solution

  • One possible solution is create expression by list comprehension:

    temperatures = [80, 90, 100]
    
    cond = '|'.join(['(temperature == ' + str(x) + ')' for x in temperatures])
    print (cond)
    (temperature == 80)|(temperature == 90)|(temperature == 100)