pythonamazon-web-servicesamazon-s3pysparkboto3

How to retrieve only the file name in a s3 folders path using pyspark


Hi I have aws s3 bucket in which few of the folders and subfolders are defined

I need to retrieve only the filename in whichever folder it will be. How to go about it

s3 bucket name - abc

path - s3://abc/ann/folder1/folder2/folder3/file1

path - s3://abc/ann/folder1/folder2/file2

code tried so far

   s3 = boto3.client(s3)
   lst_obj = s3.list_objects(bucket='abc',prefix='ann/')
   lst_obj["contents"]

I'm further looping to get all the contents

   for file in lst_obj["contents"]:
         do somtheing...

Here file["Key"] gives me the whole path, but i just need the filename


Solution

  • You can just extract the name by splitting the file Key on / symbol and extracting last element

    for file in lst_obj["contents"]:
           name = file["Key"].split("/")[-1]