jsonyoutube-apiazure-data-factoryyoutube-analytics-api

Azure Data Factory is unable to read schema of YouTube Reporting API JSON file


I am using the YouTube Analytics and Reporting API to get performance data from a Youtube channel and to store it in Azure's Data Factory (ADF) programmitcally. From the YouTube API, I get the following JSON output format:

{
  "kind": "youtubeAnalytics#resultTable",
  "columnHeaders": [
    {
      "name": string,
      "dataType": string,
      "columnType": string
    },
    ... more headers ...
  ],
  "rows": [
    [
      {value}, {value}, ...
    ]
  ]
}

See link here: https://developers.google.com/youtube/analytics/reference/reports/query#response

The response of this GET request is saved as a JSON file in a blob storage. As a next step, I want to copy the data from this JSON file into Azure's SQL database. At this point, Azure Data Factory is unable to read the JSON's schema correctly. Somehow, the "array in array" synthax and the missing "key" classification (as it is in the columnHeaders e.g. "name": string) in the "rows" section is not readable, see screenshot: JSON schema in ADF

Obviously, Azure only allows arrayofObjects: https://learn.microsoft.com/en-gb/azure/data-factory/format-json

If I manually change the synthax in the rows section as it is for the columnHeader, the schema is read correctly which proves to me that ADF is unable to process multiple arrays. As this should be an automated process, the manual workaround is not an option and was for testing purposes only.

Can anyone please tell me how Azure can read the JSON file properly and copy the data into the SQL database? Using a data flow did not look very promising to me but maybe it contains some data transformation features I am not aware of. Thanks for helping me out!


Solution

  • Actually, data factory can read the JSON file properly and copy the data into the SQL database. But the limit is that not all the JSON files can be loaded into Azure SQL database correctly.

    For example for the nested JSON file, we need do the schema conversion and manually and firstly with Data Flow, then the data could be loaded well. Per my experience, we usually need create to source with the same JSON file, get the array data we needed and then join them together to get the expected data.

    And your JSON file is nested JSON, and the value of rows is an array data, we must do these things. Or we will get errors and the pipeline won't work.

    HTH.