With kedro, how can I define the column names when reading a spark.SparkDataSet
? below my catalog.yaml
.
user-playlists:
type: spark.SparkDataSet
file_format: csv
filepath: data/01_raw/lastfm-dataset-1K/userid-timestamp-artid-artname-traid-traname.tsv
load_args:
sep: "\t"
header: False
# schema:
# filepath: conf/base/playlists-schema.json
save_args:
index: False
I have been trying to use the following schema, but it doesn't seem to be accepted (schema Pleaseprovide a valid JSON-serialised 'pyspark.sql.types.StructType'..
error)
{
"fields": [
{"name": "userid", "type": "string", "nullable": true},
{"name": "timestamp", "type": "string", "nullable": true},
{"name": "artid", "type": "string", "nullable": true},
{"name": "artname", "type": "string", "nullable": true},
{"name": "traid", "type": "string", "nullable": true},
{"name": "traname", "type": "string", "nullable": true}
],
"type": "struct"
}
this works
{"fields":[
{"metadata":{},"name":"userid","nullable":true,"type":"string"},
{"metadata":{},"name":"timestamp","nullable":true,"type":"string"},
{"metadata":{},"name":"artistid","nullable":true,"type":"string"},
{"metadata":{},"name":"artistname","nullable":true,"type":"string"},
{"metadata":{},"name":"traid","nullable":true,"type":"string"},
{"metadata":{},"name":"traname","nullable":true,"type":"string"}
],"type":"struct"}