pythonregexlistfile-read

Read multiple files, search for string and store in a list


I am trying to search through a list of files, look for the words 'type' and the following word. then put them into a list with the file name. So for example this is what I am looking for.

File Name, Type

[1.txt, [a, b, c]]
[2.txt, [a,b]]

My current code returns a list for every type.

[1.txt, [a]]
[1.txt, [b]]
[1.txt, [c]]
[2.txt, [a]]
[2.txt, [b]]

Here is my code, i know my logic will return a single value into the list but I'm not sure how to edit it to it will just be the file name with a list of types.

output = []
for file_name in find_files(d):
    with open(file_name, 'r') as f:
        for line in f:
            line = line.lower().strip()
            match = re.findall('type ([a-z]+)', line)
            if match:
                output.append([file_name, match])

Solution

  • Learn to categorize your actions at the proper loop level. In this case, you say that you want to accumulate all of the references into a single list, but then your code creates one output line per reference, rather than one per file. Change that focus:

    with open(file_name, 'r') as f:
        ref_list = []
        for line in f:
            line = line.lower().strip()
            match = re.findall('type ([a-z]+)', line)
            if match:
                ref_list.append(match)
    
        # Once you've been through the entire file,
        #   THEN you add a line for that file,
        #    with the entire reference list
        output.append([file_name, ref_list])