azure-data-factoryazure-synapseazure-data-lake

Xml pagination in Azure Data Factory


I have an API that returns data in XML format, which includes pagination. I want to store this data in Azure Blob Storage as JSON. In the JSON, I only want to extract the entry element with the following details, along with the pagination URL:

d: COMPANY : "POLIZEI" d: STREET_NAME : "Groner Landstr." d: HOUSE_NO : 51 d: POST_CODE : 37081 d: CITY : "Göttingen" d: COUNTRY_CODE : "DE"

Using the Copy activity with HTTP as the linked service in Azure Data Factory, the data is converted to JSON. Here are some screenshots showing the XML response returned by the API, the pagination URL at the end of the response, and the ADF preview:

Xml response returned by API xml response returned by API pgination url at end of respnse

ADF is converted response into JSON Adf preview

After running the Copy activity, the file created in Blob Storage is in JSON format, even though I explicitly set the file extension to XML when creating the dataset. Here is the output I am getting:

But this is the output that i am getting pagination elements required in json

I tried iterating through all pages using the Until activity, but the output from the Lookup activity (reading the json from Blob) does not find the link element where I want to extract the pagination link. The expression I used is:

@activity('lookupActivity').output.value[0].links[?(@.rel=='next')].href

Here are my questions:

1.  What other solution can I implement to handle pagination, considering the output from the HTTP web service is in XML format?
2.  If I use the Copy activity with the HTTP web service as the source and JSON format in sink , how can I correctly map the fields ? What should I pass in the collection reference to extract only these specific fields and the pagination link.One field is under link array and other fields are in entry array

Copy activity mapping

d: COMPANY : "POLIZEI" d: STREET_NAME : "Groner Landstr." d: HOUSE_NO : 51 d: POST_CODE : 37081 d: CITY : "Göttingen" d: COUNTRY_CODE : "DE"

Link: next href: https//….


Solution

  • What other solution can I implement to handle pagination, considering the output from the HTTP web service is in XML format?

    AFAIK there is not built-in support for pagination in HTTP connector of ADF.

    So, to paginate your URL that is generating output as xml you can First store the result of your URL into Json file using copy activity with the next page URL link and after this you can read the next page URL link from that Json file using lookup activity and proceed the pagination something like this you can try.

    If I use the Copy activity with the HTTP web service as the source and JSON format in sink, how can I correctly map the fields? What should I pass in the collection reference to extract only these specific fields and the pagination link. One field is under link array and other fields are in entry array

    To map all your required field, you need to enable Advance editor in Mapping to select your all-required field from that XML. enter image description here

    For next page link select @href under first link array and then change the array index to 1. enter image description here enter image description here