jsonsql-serverazure-data-factorydata-transform

Join multiple tables in Azure Data Factory to create a structure with an embedded array field?


I am trying to set up an Azure Data Factory transformation. I have a SQL Server database with three tables: Students, StudentClasses and Classes. I would like to use Azure Data Factory to read these tables and create a JSON structure. The structure is an array of Students. A Student should have a derived field which is an array of Classes. StudentClasses is a mapping that tells which Classes were taken by a given student. It has two fields StudentId and ClassId. I think I should be able to add a derived column to Student that is an array of Classes created using the join transformation. I have gotten as far as setting up the data flows for each of the tables. However, I can't figure out how to embed the array of classes in student. I have tried adding a derived array field to Student. However, it is not clear how to populate it with the content I desire.

I can add a derived field called class to Class with the following expression:

array(classID, className)

The end result I am looking for is:

{
  "students": [
    {
      "studentID": 1,
      "studentName": "John Doe",
      "classes": [
        {
          "classID": "CS101-01",
          "className": "Introduction to Computer Science"
        },
        {
          "classID": "CS102-01",
          "className": "Mathematics for Computer Science"
        }
      ]
    },
    {
      "studentID": 2,
      "studentName": "Jane Smith",
      "classes": [
        // ... (similar structure for Jane Smith)
      ]
    }
    // ... (additional students with their respective classes)
  ]
}

Solution

  • Array(classID, className) expression in derived column transformation will create an array of ClassID, ClassName. But this will not create as proper key name in key:value format created. Also, this will not be in the nested format under each student. To solve this, you can give expression like collect(@(ClassId=ClassId,ClassName=ClassName)) in the aggregate transformation. Below is the detailed approach.

    enter image description here

    Data Preview of select transformation after joining all three tables:

    StudentId StudentName ClassId ClassName
    1 John Doe CS101-01 Introduction to Computer Science
    1 John Doe CS102-01 Mathematics for Computer Science
    2 Jane Smith CS101-01 Introduction to Computer Science
    2 Jane Smith CS103-01 Data Structures and Algorithms
    3 Bob Johnson CS102-01 Mathematics for Computer Science
    3 Bob Johnson CS104-01 Database Systems

    enter image description here

    Data preview of aggregate transformation:

    enter image description here

    enter image description here

    Data preview of Aggragate2:

    enter image description here

    Output Json:

    {
      "Students": [
        {
          "StudentId": "1",
          "StudentName": "John Doe",
          "Classes": [
            {
              "ClassId": "CS101-01",
              "ClassName": "Introduction to Computer Science"
            },
            {
              "ClassId": "CS102-01",
              "ClassName": "Mathematics for Computer Science"
            }
          ]
        },
        {
          "StudentId": "2",
          "StudentName": "Jane Smith",
          "Classes": [
            {
              "ClassId": "CS101-01",
              "ClassName": "Introduction to Computer Science"
    ...........
            }
        ]
        }
    }