jsonapache-sparkapache-spark-sql

How to parse nested JSON objects in Spark SQL?


I have a schema as shown below. How can I parse the nested objects?

root
 |-- apps: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- appName: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- Ratings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |    |-- rating: long (nullable = true)
 |-- id: string (nullable = true)

Solution

  • Assuming you read in a json file and print the schema you are showing us like this:

    DataFrame df = sqlContext.read().json("/path/to/file").toDF();
        df.registerTempTable("df");
        df.printSchema();
    

    Then you can select nested objects inside a struct type like so...

    DataFrame app = df.select("app");
            app.registerTempTable("app");
            app.printSchema();
            app.show();
    DataFrame appName = app.select("element.appName");
            appName.registerTempTable("appName");
            appName.printSchema();
            appName.show();