I'm unable to send PySpark data frame as an attachment in Excel.
I'm able to do easily with CSV file using below,
email.add_attachment(df.toPandas().to_csv(index=False).encode('utf-8')
, maintype='application', subtype='csv', filename=file)
Unable do the same using excel,
email.add_attachment(df.toPandas().to_excel(file, index=False, sheet_name='FileReport'))
Any suggestion would be appreciated. Below is the code
from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
from email.message import EmailMessage
email = EmailMessage()
email['From'] = "dkfhskg"
email['To'] = "dsjagjsg"
email['Subject'] = "Report"
mail = 'Report'
email.set_content(mail , subtype='html')
_smtp_server = "kdfhsajgfjdsg"
_smtp_port = 10
smtp = smtplib.SMTP(_smtp_server, _smtp_port)
df = spark.createDataFrame([
Row(Sno=1, Rank=4., RollNo='1000', Date=date(2000, 8, 1),
e=datetime(2000, 8, 1, 12, 0)),
Row(Sno=2, Rank=8., RollNo='1001', Date=date(2000, 6, 2),
e=datetime(2000, 6, 2, 12, 0)),
Row(Sno=4, Rank=5., RollNo='1002', Date=date(2000, 5, 3),
e=datetime(2000, 5, 3, 12, 0))
])
file = 'Report_' + str(datetime.now()) + '.xlsx'
email.add_attachment(df.toPandas().to_excel(file, index=False, sheet_name='FileReport'))
Short answer:
to_csv is returning a string which is what you are calling add_attachment() with as parameter.
to_excel is not returning anything, it is only used for its side-effects (e.g. writing the in memory content to a file), thus add_attachment() has nothing to attach.
Possible Solution
First do the writing of the excel file, then open that file and attach it separately to the email like this:
file = 'Report_' + datetime.now().strftime('%Y%m%d_%H%M%S') + '.xlsx'
df.toPandas().to_excel(file, index=False, sheet_name='FileReport')
with open(file, 'rb') as f:
email.add_attachment(f.read(), maintype='application', subtype='vnd.openxmlformats-officedocument.spreadsheetml.sheet', filename=file)