pysparkazure-synapseazure-data-lake

How to save html file from azure synapse notebook to datalake storage?


In Azure Synapse and with Pyspark, I am doing Data Profiling with ProfileReport (https://github.com/ydataai/ydata-profiling):

report = ProfileReport(dataframe
                title="Profiling_pyspark_DataFrame",
                infer_dtypes=False,
                interactions=None,
                missing_diagrams=None,
                correlations={"auto": {"calculate": False},
                              "pearson": {"calculate": False},
                              "spearman": {"calculate": False}})

When I call report variable on a notebook cell, I see the HTML content I would like to save on ADLS.

Now I tried to save the HTML in the datalake with:

report.to_file("abfss://bronze@tests.dfs.core.windows.net/profile.html")

But I have the error:

FileNotFoundError: [Errno 2] No such file or directory: 'abfss:/bronze@tests.dfs.core.windows.net/profile.html'

Where am I wrong ? (I have a linked service between synapse and ADLS).


Solution

  • Yes, you are right. I am just adding it to answer with some other ways so that it would help the community.

    mssparkutils.fs.put("abfss://data@jsynapadls.dfs.core.windows.net/synapse/report.html", profile.to_html(), True)
    

    Output:

    Enter image description here

    One more way is to save it in Synapse and copy or move to ADLS storage.

    profile.to_file("/tmp/report2.html")
    mssparkutils.fs.cp("file:/tmp/report2.html", "abfss://data@jsynapadls.dfs.core.windows.net/synapse/report2.html")
    

    or

    mssparkutils.fs.mv("file:/tmp/report2.html", "abfss://data@jsynapadls.dfs.core.windows.net/synapse/report3.html")
    

    Output:

    Enter image description here

    While accessing the local filesystem in Synapse, you need to prefix the path with file:/.