pythonlistfiledirectorysubdirectory

How can I specify the directory I want to get using the os library?


I have a directory called "Data" and inside it I have 35 other directories with another bunch of directories each. I need to check if these last directories have .txt files and, if so, I want to get the name of the specific directory that is one of the aforementioned 35. After this, I want to use the pandas library to generate a "yes/no" spreadsheet with "yes" for the directories (one of the 35) that have .txt files and "no" for the directories (one of the 35) that do not have .txt files.

For now, I could write the following as a test:

import os
w=[]
w=os.listdir(r'C:\Users\Name\New\Data')
tot=len(w)
a=0
while a!=tot:
    print(w[a])
    a=a+1

Which gives me the names of the 35 main directories I am interested in (Folder1, Folder2, Folder3, ..., Folder35)

AND

for root, dirs, files in os.walk(r'C:\Users\Name\New\Data'):
    for file in files:
        if file.endswith('.txt'):
            print(root)

But it results in a list with the whole path, like "C:\Users\Name\New\Data\Folder1\Folder1-1", and what I really need is to compare the name "Folder1" to the entries of the aforementioned list.

How can I check if the element in "w[]" corresponds to the name in "root"?


Solution

  • To check if each main directory contains any .txt files in its subdirectories, we can combine the logic you've started with and streamline it to match only the main directory name (one of the 35). Here's the code to achieve this and generate a yes/no spreadsheet using pandas.

    import os
    import pandas as pd
    
    main_dir = r'C:\Users\Name\New\Data'
    main_folders = os.listdir(main_dir)
    results = {folder: "No" for folder in main_folders}
    
    for root, dirs, files in os.walk(main_dir):
        if any(file.endswith('.txt') for file in files):
            main_folder_name = os.path.basename(os.path.dirname(root))        
            if main_folder_name in results:
                results[main_folder_name] = "Yes"
    
    df = pd.DataFrame(list(results.items()), columns=['Folder', 'Has_txt_file'])
    output_path = r'C:\Users\Name\New\output.xlsx'
    df.to_excel(output_path, index=False)
    

    I hope this will help you a little.