I have string like below
Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,gouravj09@gmail.com"
Is there any way to convert this into dataframe in spark where each comma considered as new column
Final DataFrame should look like this
note - this string generated within loop and during each loop new string gets generated , I have to append this string into dataframe after splitting that with comma seperator
Usually you want to read data from a file with spark, even from a set of files to support parallel processing. As already suggested in comments spark.read.csv
is what you should use to read csv file.
I added examples with temporary file, just to give you an inline working example. For real cases I recommend writing a real file.
You can provide a schema into the csv
function or include a header into your file. If no schema is provided, spark will name columns _cN
.
import tempfile
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,gouravj09@gmail.com" \n""")
fp.close()
spark.read.csv(fp.name).show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,gouravj09@gmail.com" \n""")
fp.close()
spark.read.csv(fp.name, schema="Name string, Surname string, Address string, Phone string, Email string").show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Name,Surname,Address,Phone,Email\n""")
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,gouravj09@gmail.com" \n""")
fp.close()
spark.read.csv(fp.name, header=True).show()
+-------+-------+----------------+----+--------------------+
| _c0| _c1| _c2| _c3| _c4|
+-------+-------+----------------+----+--------------------+
|Gourav | Joshi |Karnataka, India|NULL|gouravj09@hotmail...|
+-------+-------+----------------+----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+