I'm trying to JOIN two tables in a DataBricks Notebook. The first line in the SQL statement is erroring-out.
I can't determine why. The docs I've read say its typically due to a typo. But that is not the case for me (at least not that I can see).
Error in SQL statement: ParseException:
no viable alternative at input 'WITH mgt '(line 1, pos 8)
== SQL ==
WITH xxx AS(
--------^^^
This can be caused by characters that look like ordinary space (0x20), but are something different. Unicode has quite a lot of them and it happens, especially on copy-paste, that some weird formatting corrupts SQL query string.
For example:
val sql = "WITH mgt AS(select 1) select * from mgt"
spark.sql(sql)
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'WITH mgt '(line 1, pos 8)
== SQL ==
WITH mgt AS(select 1) select * from mgt
--------^^^
Why did that occur? We can discover by looking at exact byte representation of SQL:
sql.getBytes("utf-8").map("%02X".format(_)).mkString
57495448206D6774E2808041532873656C6563742031292073656C656374202A2066726F6D206D6774
^^^^^^
Marked sequence is 0xE28080
- which is En Quad, not a space. You can backspace it and type again to fix it.