microsoft-information-protectionazure-information-protectionmip-sdk

What are the tradeoffs between the different MIP SDK options for the bulk assignment of sensitivity labels?


Given the current status of the MIP SDK, and the fact that assigning sensitivity labels with the Graph SDK is in a sort of "public preview" state, what are the current limitations around bulk assigning sensitivity labels with the Graph SDK?

Some context: Our current technology stack is integrated with the Graph SDK, and for our current use case, we are fine with the "public preview" status of this API endpoint, but it seems like the roll out of the preview is not yet complete. (We don't have a response yet from the form on that page). Integrating the C++/C# File SDK solutions, especially given that our initial experiments with the Java wrapper haven't been successful, would require a great deal of additional work.

SDK overview from presentation

Above are the current integration options from this presentation.

If we need to bulk assign sensitivity labels to 10 thousand documents, can we use the Microsoft Graph SDK for this or must we use the File SDK? Can this be done through service account connections (e.g. using a client_id and client_secret)? How efficiently can the File SDK assign sensitivity labels and are they any limitations here? What about when bulk assigning 10 million documents?

EDIT 20/09/2022: In response to a question from below, we are trying to assign sensitivity labels to files that are located in Microsoft 365 Sharepoint and OneDrive. To date, in order to read and write other types of metadata, we have typically accessed these files through the Graph SDK and client secret credentials.


Solution

  • MIP SDK isn't an ideal solution for files that are at rest in SharePoint or OneDrive if performance is a concern. Labeling the files requires fully downloading, applying the label, then replacing the file in the service. That adds latency both directions on the download and upload, as well as applying the protection.

    For files local on disk, I've written samples that can apply labels to hundreds of files per minute (varies based on files size and protection status). Unfortunately, the added overhead of extracting the file from ODSP, labeling, and putting it back will make that unachievable.

    Instead, you should look at these APIs: https://www.linkedin.com/pulse/programatic-way-apply-sensitivity-label-file-sanjoyan-mustafi/

    When it comes to labeling in Graph, the label APIs will be a subset or a property on the thing you're trying to label. Rather than have a MIP API that targets a driveItem, you'll set a property on the driveItem itself.

    Edit: Graph API Docs: https://learn.microsoft.com/en-us/graph/api/driveitem-assignsensitivitylabel?view=graph-rest-beta&tabs=http