I'm reading a sql table in a notebook on Synapse and loading it in a pyspark dataframe:
df = spark.read.synapsesql("dbtablename")
Unfortunately some columns have a space in their name e.g.: Job Title.
I tried different methods to change the name of the columns and remove the space. I can see the new dataframes with the fixed column name but when I try to convert to Polars or Pandas dataframe the operation fails and the error message refers back to the original name of the column with the space in the first dataframe.
I tried with:
sqlContext.registerDataFrameAsTable(df, "tablename")
and then df_new = sqlContext.sql("select JobTitle from table")
Thank you for your kind help.
I was able to solve this problem by modifying the way I'm querying the table on the dedicated sql pool.
Before:
df = spark.read.synapsesql("db.tablename")
After:
df = (spark.read
.option(Constants.SERVER, "server_name.sql.azuresynapse.net")
.option(Constants.DATABASE, "db_name")
.synapsesql("select [Job Title] as JobTitle from dbo.TableName)
)
pandas_df = df.toPandas()
Thank you for your help. I tried again with the previously shared code and with a new pool and resources but I still had the same issue. This solution allows me to modify the name of the column.
To note that:
select [Job Title] from dbo.TableName
doesn't work when transforming to pandas dataframe or simply doing:
df.show()
So the column alias without a space solves it.
Thank you.