excelpandasemailpysparkpyspark-pandas

PySpark dataframe to Excel Email attachment with sheet name


I'm unable to send PySpark data frame as an attachment in Excel.

I'm able to do easily with CSV file using below,

email.add_attachment(df.toPandas().to_csv(index=False).encode('utf-8')
, maintype='application', subtype='csv', filename=file)

Unable do the same using excel,

email.add_attachment(df.toPandas().to_excel(file, index=False, sheet_name='FileReport'))

Any suggestion would be appreciated. Below is the code

from datetime import datetime, date 
import pandas as pd 
from pyspark.sql import Row 
from pyspark.sql import SparkSession 
spark = SparkSession.builder.getOrCreate() 
from email.message import EmailMessage

email = EmailMessage()
email['From'] = "dkfhskg"
email['To'] = "dsjagjsg"
email['Subject'] = "Report"

mail = 'Report'

email.set_content(mail , subtype='html')

_smtp_server = "kdfhsajgfjdsg"
_smtp_port = 10

smtp = smtplib.SMTP(_smtp_server, _smtp_port)

df = spark.createDataFrame([ 
    Row(Sno=1, Rank=4., RollNo='1000', Date=date(2000, 8, 1), 
        e=datetime(2000, 8, 1, 12, 0)), 
    Row(Sno=2, Rank=8., RollNo='1001', Date=date(2000, 6, 2),  
        e=datetime(2000, 6, 2, 12, 0)), 
    Row(Sno=4, Rank=5., RollNo='1002', Date=date(2000, 5, 3), 
        e=datetime(2000, 5, 3, 12, 0)) 
]) 


file = 'Report_' + str(datetime.now()) + '.xlsx'

email.add_attachment(df.toPandas().to_excel(file, index=False, sheet_name='FileReport'))

Solution

  • Short answer:

    to_csv is returning a string which is what you are calling add_attachment() with as parameter.

    to_csv docs

    to_excel is not returning anything, it is only used for its side-effects (e.g. writing the in memory content to a file), thus add_attachment() has nothing to attach.

    to_excel docs

    Possible Solution

    First do the writing of the excel file, then open that file and attach it separately to the email like this:

    file = 'Report_' + datetime.now().strftime('%Y%m%d_%H%M%S') + '.xlsx'
    df.toPandas().to_excel(file, index=False, sheet_name='FileReport')
    
    with open(file, 'rb') as f:
        email.add_attachment(f.read(), maintype='application', subtype='vnd.openxmlformats-officedocument.spreadsheetml.sheet', filename=file)