python-3.xpysparkapache-spark-sql

Not able to select more than 255 columns from Pyspark DataFrame


I am trying to select 500 columns from a Pyspark DatFrame. Getting error as "SyntaxError: more than 255 arguments"

Df2 = Df\
  .select("col1","col2","col3",...............,"col500")

Tried below approach also, bit did not work.

cols = ["col1","col2","col3",...............,"col500"]
Df2 = Df\
     .select(cols)

Both the approach is working for less than 255 columns.

Note : My Python version is 3.6

Please Advice me. Thanks.


Solution

  • After having conversation with @pissall, below are two workable solutions to select more than 255 columns:

    Case 1:

    cols = ["col1","col2","col3",...............,"col500"]
    df2 = df.select(cols)
    

    Case 2:

    df.createOrReplaceTempView("df"); 
    spark.sql("SELECT col1, col2, ..., col500 FROM df")