How to create a schema for the below json to read schema. I am using hiveContext.read.schema().json("input.json"), and I want to ignore the first two "ErrorMessage" and "IsError" read only Report. Below is the JSON:
{
"ErrorMessage": null,
"IsError": false,
"Report":{
"tl":[
{
"TlID":"F6",
"CID":"mo"
},
{
"TlID":"Fk",
"CID":"mo"
}
]
}
}
I created the below schema :
val schema = StructType(
Array(
StructField("Report", StructType(
Array(
StructField
("tl",ArrayType(StructType(Array(
StructField("TlID", StringType),
StructField("CID", IntegerType)
)))))))))
Below is my json.printSchema() :
root
|-- Report: struct (nullable = true)
| |-- tl: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- TlID: string (nullable = true)
| | | |-- CID: integer (nullable = true)
The schema is incorrect. CID
in your data is clearly not String
("mo"
). Use
val schema = StructType(Array(
StructField("Report", StructType(
Array(
StructField
("tl",ArrayType(StructType(Array(
StructField("CID", StringType),
StructField("TlID", StringType)
)))))))))
and:
val df = Seq("""{
"ErrorMessage": null,
"IsError": false,
"Report":{
"tl":[
{
"TlID":"F6",
"CID":"mo"
},
{
"TlID":"Fk",
"CID":"mo"
}
]
}
}""").toDS
spark.read.schema(schema).json(df).show(false)
+--------------------------------+
|Report |
+--------------------------------+
|[WrappedArray([mo,F6], [mo,Fk])]|
+--------------------------------+