When using pushdown predicate with AWS Glue Dynamic frame, how does it iterate through a list?
For example, the following list was created to be used as a pushdown predicate:
day=list(p_day.select('day').toPandas()['day'])
month=list(p_month.select('month').na.drop().toPandas()['month'])
year=list(p_year.select('year').toPandas()['year'])
predicate = "day in (%s) and month in (%s) and year in (%s)"%(",".join(map(lambda s: "'"+str(s)+"'",dat))
,",".join(map(lambda s: "'"+str(s)+"'",month))
,",".join(map(lambda s: "'"+str(s)+"'",year)))
Let's say it returns this:
"day in ('07','15') and month in ('11','09','08') and year in ('2021')"
How would the push down predicate read this combination/list?
Is it:
day | month | year |
---|---|---|
07 | 11 | 2021 |
15 | 11 | 2021 |
07 | 09 | 2021 |
15 | 09 | 2021 |
07 | 08 | 2021 |
15 | 08 | 2021 |
-OR-
day | month | year |
---|---|---|
07 | 11 | 2021 |
15 | 11 | 2021 |
15 | 08 | 2021 |
15 | 09 | 2021 |
I have a feeling that this list is read like the first table rather than the latter... But, it's the latter that I would like to pass through as a pushdown predicate. Does creating a list essentially cause a permutation? It's as if the true day, month, and year combination is lost in the list which should be 11/7/2021, 11/15/2021, 08/15/2021, and 09/15/2021.
This has nothing to do with Glue itself, since the Partition Predicate is just basic Spark SQL. You will receive the first list and not the second. You would have to restructure the boolean expression to receive the second list.