Question: Why I am getting following error on the last line of the code below, how the issue can be resolved?
AttributeError: 'DataFrame' object has no attribute 'OrderID'
CSV File encoding: UTF-16 LE BOM
Number of columns: 150
Rows: 5000
Language etc.: Python, Apache Spark, Azure-Databricks
MySampleDataFile.txt:
FirstName~LastName~OrderID~City~.....
Kim~Doe~1234~New York~...............
Bob~Mason~456~Seattle~...............
..................
Code sample:
from pyspark.sql.types import DoubleType
df = spark.read.option("encoding","UTF-16LE").option("multiline","true").csv("abfss://mycontainder@myAzureStorageAccount.dfs.core.windows.net/myFolder/MySampleDataFile.txt", sep='~', escape="\"", header="true", inferSchema="false")
display(df.limit(4))
df1 = df.withColumn("OrderID", df.OrderID.cast(DoubleType()))
Output of display(df.limit(4)) It successfully displays the content of df in a tabular format with column header - similar to the example here:
---------------------------------------
|FirstName|LastName|OrderID|City|.....|
---------------------------------------
|Kim~Doe|1234|New York|...............|
|Bob|Mason|456|Seattle|...............|
|................ |
---------------------------------------
AttributeError: 'DataFrame' object has no attribute 'OrderID'
how the issue can be resolved?
You can try the following way to change the data type
.
df1 = df.withColumn("OrderID", df[“OrderID”].cast(DoubleType()))
OR - Alternative way,
pyspark.sql.functions.col
It will return a column depending on the name of the provided column
.
from pyspark.sql.functions import col
df1 = df.withColumn("OrderID", col("OrderID").cast(DoubleType()))