I have a rather strange issue in a managed pyspark environment that's hosted on EMR 6.10.1
When running this query:
spark.sql("select 1 as a, a+a as b, b+b as d").show()
On local machine, databricks any other pyspark instance I am getting proper results.
However, when I am running that query on an EMR cluster I am getting pyspark.sql.utils.AnalysisException: Column 'a' does not exist. Did you mean one of the following ? []
Does anyone know which setting is causing this sort of behavior?
This feature is called lateral column alias references and it was introduced in Spark 3.4. EMR 6.10 has Spark 3.3, that's why it raises the exception.