pythonlistloopsdirectoryos.walk

Why do I get a ''FileNotFoundError'' in python?


I have a list of the xlsx files in my directory and subdirectories and I want to loop through this list with certain conditions. Now it seems that the code works for the main directory, but it has troubles opening the files within the subdirectories.. I used the os.walk method but I still get the error ''[Errno 2] No such file or directory: 'file name''' . The error occurs at the last piece of the code, the part that starts with 'for f in files: if f.endswith('.xlsx'): and so on..

How to fix this problem?

path = os.getcwd()
files = os.listdir(path)

directories = ['2018', '2017', '2016', '2015']

for directory in directories:
   directory_path = os.path.join(path, directory)
   files_in_directory = os.listdir(directory_path)
   for file in files_in_directory:
       files.append(file)


 filtered_files_list = []

 for f in files:
    if f.endswith('.xlsx'):      
       wb = openpyxl.load_workbook(f)
       if "2014" in wb.sheetnames:
           filtered_files_list.append(f)

 for root, dirs, files in os.walk(path):
   if root.endswith("2018") or root.endswith("2017") or root.endswith("2016") or root.endswith("2015"):
        for f in files:
           if f.endswith('.xlsx'):               
               wb = openpyxl.load_workbook(os.path.join(root, f))
               if "2014" in wb.sheetnames:
                   filtered_files_list.append(f)

print(filtered_files_list)

Solution

  • Your listdir + walk combination sounds like it could be simplified with a pathlib.Path.glob, which will also give you full paths without the need to join.

    from pathlib import Path
    from openpyxl import load_workbook
    
    filtered_files_list = []
    
    filter_directories = {"2015", "2016", "2017", "2018"}  # set for fast search
    
    p = Path(".")  # equivalent to getcwd
    for xlsx in p.glob("./**/*.xlsx"):  # recursive search for all XLSX under CWD
        if str(xlsx.parents[-2]) not in filter_directories:  # skip if not in filter_directories
            continue
        wb = openpyxl.load_workbook(xlsx)
        if "2014" in wb.sheetnames:
            filtered_files_list.append(xlsx)
    

    In the following hierarchy it finds:

    .
    ├── 2015
    │   ├── has-2014-sheet.xlsx
    │   └── no-2014-sheet.xlsx
    ├── 2016
    │   └── sub
    │       ├── has-2014-sheet.xlsx
    │       └── no-2014-sheet.xlsx
    ├── 2020
    │   ├── has-2014-sheet.xlsx
    │   └── no-2014-sheet.xlsx
    └── other
        ├── has-2014-sheet.xlsx
        └── no-2014-sheet.xlsx
    
    [PosixPath('2015/has-2014-sheet.xlsx'),
     PosixPath('2016/sub/has-2014-sheet.xlsx')]