pysparkazure-synapseshared-dataonpremises-gatewayshared-drive

Synapse notebook access CSV / ZIP files from on premise


I am trying to access different types of files (CSV/ZIP/ ETC...) within my company's on premise shared drives. As different departments submit their documents, I seek python / pyspark code to fetch these files. Company procedures wont allow to change and upload them into blob storage.


Solution

  • AFAIK, Synapse notebooks are cloud-based environments, so they don't have direct access to your local file system. To work with local files in Synapse notebooks, you'll need to upload them to a cloud storage service like Azure Blob Storage or Azure Data Lake Storage Gen2.

    If you don't want to us blob storage your Synapse workspace have Azure Data Lake storage Gen 2 (ADLS) assigned to it, you can use it to first store the data from on-prem shared folders to ADLS account and after that you can fetch the files from ADLS account to Synapse notebook.

    You can use synapse pipeline to load data from on-prem to ADLS using file System linked service with SHIR which copy files from on-prem.