dataframepysparkaws-glue

Extract Column values from data frame and pass into SQL pyspark where Clause


I am trying a scenario to extract data from backend into Data frame and just retrieve Column1 list values example "ID" column from that table and pass that list of ID values into SQL query for another data extraction. Tried the below line and it gives me array of response like the one i pasted here:

row_list = df.select('Column_header').collect()

What I would like to extract is like this:

[val1,val2,val3.....]

Tried with RDD and map, but still getting syntax errors even on using correct format. Need help here.

Tried RDD, flatmap etc no syntax works.


Solution

  • There's an extra step missing:

    row_list = df.select('Column_header').collect()
    result = [row['Column_header'] for row in row_list]