jsonf#deedlefsharp.data.typeproviders

F# and Web API Json for analysis in DataFrame


Im trying to learn F# by rewriting some scripts from python, where I query a graphql endpoint and load the json into a Pandas DataFrame for cleaning and analysis:

json like this:

apiResponse = {"data":
                    {"Shipments":{"ErrorMessage":null,
                    "Success":true,
                    "ValidationResult":null,
                    "TotalCount":494,
                    "Data":[
                       {"Building":"B7",
                        "Comment":"Test",
                        "CompletedDate":"2021-04-12T10:13:13.436Z",
                        "ItemId":"dd4520bb-aa0a-...",
                        "NoOfUnit":5,
                        "Organization":{
                              "OrganizationId":"cac43a32-1f08-...",
                              "OrganizationName":"XXX"},
                       "Adress":"Road 5"
                      },
                      {"Building":"B7",
                      ....}]}}}

Python:

data = request.json()
#only the "Data" path
json_list = data['data']['Shipments']['Data']
df = json_normalize(json_list)

The same approach with using Fsharp.Data, while "file.json" is only the "Data[]" part ( I tried to create a fiddle, but I could not get it to run. here:


type ApiTypes = JsonProvider<"file.json"> //where file.json has only "Data" path of apiResponse
let jsonResponse = JsonValue.Parse(apiResponse)
let data = jsonResponse?data
let Shipments = data?Shipments
let Data = Shipments?Data

let input = 
 ApiTypes.Parse(Data.ToString())

let df = Frame.ofRecords input

But this is not working. So my questions:

  1. Is this the right way to work with this json?
  2. Is there a better way to create a DataFrame with json?

Any help apreciated. Thanks


Solution

  • I don't have any experience with Deedle, but I think ofRecords requires an actual static type to work correctly, so it may not be compatible with JsonProvider. (I couldn't get it to work, at least.)

    Instead, I would define the type by hand and then deserialize to it, like this:

    open Newtonsoft.Json.Linq
    
    type Datum =
        {
            Building : string
            Comment : string
            CompletedDate : DateTime
            ItemId : Guid
            NoOfUnit : int
            Organization :
                {|
                    OrganizationId : Guid
                    OrganizationName : string
                |}
            Adress : string
        }
    
    let jobj = JObject.Parse(apiResponse)
    let frame =
        jobj.["data"].["Shipments"].["Data"].Children()
            |> Seq.map (fun jtok -> jtok.ToObject<Datum>())
            |> Frame.ofRecords