pythonglobpathlib

How to iterate through files using pathlib.glob() when files share very similar names


My Directory looks like this:

P1_AAA_NOT_SAMPLE.csv
P1_AAA_SAMPLE.csv
P1_BBB_NOT_SAMPLE.csv
P1_BBB_SAMPLE.csv
P1_CCC_NOT_SAMPLE.csv
P1_CCC_SAMPLE.csv

P2_AAA_NOT_SAMPLE.csv
P2_AAA_SAMPLE.csv
P2_BBB_NOT_SAMPLE.csv
P2_BBB_SAMPLE.csv
P2_CCC_NOT_SAMPLE.csv
P2_CCC_SAMPLE.csv

How do I iterate through files in this directory using pathlib.glob() if I want to capture only the SAMPLE files (ie. I don't want the NOT_SAMPLE files).

My code looks like this:

from pathlib import Path

file_path = r'C:\Users\HP\Desktop\My Directory'

for fle in Path(file_path).glob('P*_*_SAMPLE.csv'):
    # do something with each SAMPLE file

But this code will also capture both SAMPLE files and NOT_SAMPLE files. Is there a way to adjust the wildcards or glob() part to only capture SAMPLE files, preferably using pathlib?

Thanks in advance.


Solution

  • You can filter in a generator expression (or a list comprehension), like this:

    for fle in (p for p in Path(file_path).glob('P*_*_SAMPLE.csv') if 'NOT_SAMPLE' not in str(p)):
    

    or build a list before:

    valid_paths = [p for p in Path(file_path).glob('P*_*_SAMPLE.csv') if 'NOT_SAMPLE' not in str(p)]
    
    for fle in valid_paths: