python-3.xamazon-s3boto3pandas-profiling

How to import pandas profile report output as html/json file on AWS S3 location


I have a dataframe df, generating ProfileReport shown below:

 profile = pandas_profiling.ProfileReport(
         df, title=f"file_name Data Profile Report",  minimal=True)

after profiling writing the output to local file system in ec2 machine successfully using below code:

profile.to_file('processedDataPath/file_name-profile.html')

Now I want to write the profile output to s3 bucket using awswrangler.s3, but can't find appropriate awswrangler.s3.to_xxx() to write .html file in s3 location as below:

awswrangler.s3.to_xxx(profile, path='s3://analytics-storage-bucket/processedData/file_name-profile.html')

Looking for appropriate python method/code which can write the profiling output to S3 location.


Solution

  • After generating the profile report as

     profile = pandas_profiling.ProfileReport(
             df, title="Data Profile Report",  minimal=True)
    

    To write .html file to S3, we have to first write this file to local filesystem and then upload the file from local filesystem to S3 and finally delete the file from local filesystem as below:

    # write .html file to s3
    profile.to_file('./file_name-profile.html')
    awswrangler.s3.upload(local_file='./file_name-profile.html', path='s3://analytics-storage-bucket/processedData/file_name-profile.html')
    os.remove('./file_name-profile.html')
    ###
    

    This code works on ec2 and aws glue job.