pythonamazon-web-servicesamazon-s3aws-lambdaaws-data-wrangler

Having issues with my excel to csv conversion in aws lambda using the AWSDatawrangler layer


I have a function that reads an excel file into a dataframe and then I save that dataframe in an s3 bucket using the awswrangler api to_csv function. The excel file has data starting from different rows and columns.

My conversion code looks something like this:

df = wr.s3.read_excel(
              path = 's3://bucket/filepath/', 
              sheet_name = 'sheetname', 
              header = 5, 
              index_col = False).iloc[:, 3:]

wr.s3.to_csv(df, path = "s3://bucket/filepath/', 
                 dataset = True, 
                 filename_prefix = 'sheetname')

The code works properly but I am having 2 issues:

  1. The csv outputs an additional index column even though I have specified index_col = False
  2. The file csv output name comes with additional serial numbers which I dont want. I just want something like sheetname.csv. I looked at the documentation and didn't find any good leads on how I can do that.

The current output looks something like sheetname8f1e8wefdf15f4wgfe5fef46we54f.csv


Solution

  • You can specify the full path in the path parameter. You don't need to use filename_prefix at all. Using filename_prefix is why you are getting those extra characters in the filename.

    Also, you can disable the index in the output by setting index = False.

    wr.s3.to_csv(df, path = "s3://bucket/filepath/sheetname.csv", 
                     index = False)