microsoft-fabric

Write a df from a Microsoft Fabric notebook into my local disk folder


How do I write a dataframe in my microsoft fabric notebook to a folder in my local computer as an excel file.

I have tried but I keep getting that folder directory does not exits.I have checked the local folder directory does exists.

If there is any work around, I would appreciate.

Thank you

Here is the example of the code below

from pyspark.sql import SparkSession
import pandas as pd

# Create a Spark session
spark = SparkSession.builder.appName("SaveDFtoExcel").getOrCreate()

# Create a sample PySpark DataFrame
data = [("Alice", 34), ("Bob", 45), ("Catherine", 29)]
columns = ["Name", "Age"]
spark_df = spark.createDataFrame(data, columns)

# Convert PySpark DataFrame to Pandas DataFrame
pandas_df = spark_df.toPandas()

# the local file path
file_path = "C:/Users/YourUsername/Documents/sample_data.xlsx"

# Save the Pandas DataFrame to an Excel file
pandas_df.to_excel(file_path, index=False, engine="openpyxl")

Here is error message

OSError: Cannot save file into a non-existent directory: 'C:/Users/YourUsername/Documents/sample_data.xlsx'

I am very confident that the file directory exists and it is correct and I have permission to write and read into that directory.

It worked in my normal positron or VScode IDE but does not work in the Microsoft Fabric Notebook

Thank you.


Solution

  • Writing a DataFrame from a Microsoft Fabric notebook to your local disk you need to save the DataFrame to a temporary location within Microsoft Fabric (lakehouse) and then downloading it to your local system.

    You can get the path to store the file in lakehouse as below:

    enter image description here

    Sample code to save the DataFrame to a temporary location to Lake house:

    from pyspark.sql import SparkSession
    import pandas as pd
    
    spark = SparkSession.builder.appName("SaveDFtoExcel").getOrCreate()
    
    data = [("Alice", 34), ("Bob", 45), ("Catherine", 29)]
    columns = ["Name", "Age"]
    spark_df = spark.createDataFrame(data, columns)
    
    pandas_df = spark_df.toPandas()
    file_path = "Copied path from above step"
    
    pandas_df.to_excel(file_path, index=False, engine="openpyxl")
    

    After this you can create data pipeline to store the file from Lakehouse to local file

    Add source as lake house file as below: enter image description here

    Add destination as Folder and create connection to local and run the pipeline. enter image description here

    Or you can use OneLake file explorer to access Fabric data check this document for more information https://learn.microsoft.com/en-us/fabric/onelake/onelake-file-explorer