probably a silly issue, but I don't get it. I'm working on a Jupyter Notebook with Python3.6, Spark 2.4, hosted by IBM Watson Studio.
I have a simple csv file:
num,label
0,0
1,0
2,0
3,0
And to read it I use the following commands:
labels = spark.read.csv(url, sep=',', header=True)
But if I check if labels
is correct, using labels.head()
, I get Row(PAR1Љ��L�Q�� ='\x08\x00]')
What am I missing?
This looks like due to an encoding issue
Try this with an encoding provided in the option,alo try with UTF-8
labels = spark.read.csv(url, sep=',', header=True).option("encoding", "ISO-8859-1")