I am trying to move files between Azure Data Factory and SharePoint Online.
I have some test data files in a SharePoint location and I have a pipeline in Data factory containing a Web Activity and a Copy data activity. The copy activity then links to a Data Lake as its sink.
My desired outcome is for the Copy Data activity to be granted access to the SharePoint Online API and then proceed to move the retrieved data into the data lake. As is stands it has the bearer token but cant access the data through the SharePoint Online API.
I have followed the Microsoft guide here, including the prerequisites.
I am getting the following error in Data Factory:
Error code 2200
"Failure happened on 'Sink' side. ErrorCode=UserErrorInvalidCredentialToReadHttpFile,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The credential to read http file is invalid.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (401) Unauthorized.,Source=System,
The error is present on the Copy data activity (Which uses a HTTP linked service), not the web activity. Therefore I am successfully getting a bearer token which is passed to the Copy data activity, only to be denied.
Troubleshooting:
I have assigned Owner role access to the data factory for the data lake.
I have registered the service principal App as per the prerequisites in the Microsoft guide above, successfully using the Application ID, Key and Tenant Id to access the bearer token
I have granted SharePoint online full access to the registered
service app, successfully, as I can see it listed in the trusted apps in SharePoint. I did so not
only here:
https://my_company.sharepoint.com/_layouts/15/appinv.aspx
but here
https://my_company.sharepoint.com/sites/folder/_layouts/15/appinv.aspx
I have granted full API permissions to SharePoint and Microsoft Graph in the registered Service app.
I have replicated the process in Postman. Only to have the same
results - I retrieve a bearer token and then get 401 error when
trying calling a GET request for the files, using the bearer token.
The error in Postman is: {"error_description":"Exception of type 'Microsoft.IdentityModel.Tokens.AudienceUriValidationFailedException' was thrown."}
I 've read that an alternative OData linked service in Data Factory is not possible using using basic authentication here. So I have tried this article using AAD service principal with cert, but had further problems trying to generate a certificate. Perhaps this is my best bet.
Most relevant posts I've used:
How to move sharepoint list or excel file to azure sql dw?
http://www.ktskumar.com/2017/01/access-sharepoint-online-using-postman/
So I found the solution to my question. Using an Azure Logic app to move the files from SharePoint Online to Azure storage.
This was infinitely easier than using data factory and I wish someone told me that a while ago.
Reference found here