I'm using SQLContext to read in a CSV file like this:
val csvContents = sqlContext.read.sql("SELECT * FROM
csv.`src/test/resources/afile.csv` WHERE firstcolumn=21")
But it's printing out the first column as _c0
and including the header under it. How do I set the header and use the SQL query? I've seen this solution:
val df = spark.read
.option("header", "true") //reading the headers
.csv("file.csv")
But this doesn't allow me to do the SELECT
query with the WHERE
clause. Is there a way to specify a CSV header and do a SQL SELECT
query?
It turns out the header wasn't being parsed correctly. The CSV file was tab-delimited so I had to explicitly specify that:
val csvContents = sqlContext.read
.option("delimiter", "\t")
.option("header", "true")
.csv(csvPath)
.select("*")
.where(s"col_id=22")