azure-data-factorysharepoint-onlinemicrosoft-fabric

Fabric datafactory copy data to ingest Sharepoint files


I'm looking at MS Fabric to:

  1. Ingest data from multiple sources including xls / csv sharepoint online files and azure SQL databases.
  2. Store this data in onelake
  3. Use synapse (or a workflow in data factory) to process the various sources data into a data warehouse
  4. use dataflows to read the DW data into smaller datasets to be used by multiple power bi reports.

My initial problem is how to read the sharepoint online files.

I can do this easily in the power bi service with a dataflow by choosing a web api connector, and pasting in the file path from sharepoint and choosing my organisational account to authenticate.

However, as I understand it, I can't then write that dataflow to onelake, to do that, I need to use Data Factory connection / copy data tasks.

How?

There are loads fewer connectors / data sources in DataFactory, so I assume that I need to use one of the generic protocols (http?) of the copy data activity, or the web activity to connect to my source sharepoint file(s)??

I've tried doing that, and using the sharepoint file path as the url, but I'm not getting anywhere - basically I haven't got a clue how to do it, and I can't find documentation that's helpful.

The closest I can find is how to access a sharepoint online list, but I don't need that, I need to access the actual files.

Has anyone got an idiots guide of how to do this?

Cheers


Solution

  • First of all, to connect to your sharepoint you need to make sure you are authorized, accessing a sharepoint hosted on another tenant from another tenants ADF will need further steps to make it possible.

    I understand sharepoint lists are not the thing you want, but actual files. I also understand that the documentation seems to be confusing when working with sharepoint, this list and file dichotomy is confusing to many. To copy from Sharepoint, you still need to follow the process listed on documentation, such as registering the app, and use a web activity to authenticate, then passing to a connected copy activity with HTTP connector as source. I want to give this to you as visually as possible so I tried looking for a video how to set it up, there is a video that I liked which discusses how to do it with power automate for a few minutes in beginning, and spends the rest of the video explaining this process in ADF. By the way, the Azure equivalent of Power Automate is Logic Apps, you can also use it to achieve this task. Hope it helps:

    https://www.youtube.com/watch?v=fhtwW3FXf3g

    https://www.youtube.com/watch?v=FFfNu3cI-uw (this video might help but i havent gone through it)