pythonfile-search

How to search and read a text file in a specific folder faster using python


I've written a simple python script to search for a log file in a folder (which has approx. 4 million files) and read the file. Currently, the average time taken for the entire operation is 20 seconds. I was wondering if there is a way to get the response faster.

Below is my script

import re
import os
import timeit
from datetime import date

log_path = "D:\\Logs Folder\\"
rx_file_name = r"[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}"
log_search_script = True
today = str(date.today())

while log_search_script:

    try:

        log_search = input("Enter image file name: ")

        file_name = re.search(rx_file_name, log_search).group()

        log_file_name = str(file_name) + ".log"

        print(f"\nLooking for log file '{log_file_name}'...\n")
        pass

    except:
        print("\n ***** Invalid input. Try again! ***** \n")
        continue

    start = timeit.default_timer()

    if log_file_name in os.listdir(log_path):

        log_file = open(log_path + "\\" + log_file_name, 'r', encoding="utf8")

        print('\n' + "--------------------------------------------------------" + '\n')

        print(log_file.read())
        log_file.close()

        print('\n' + "--------------------------------------------------------" + '\n')

        print("Time Taken: " + str(timeit.default_timer() - start) + " seconds")

        print('\n' + "--------------------------------------------------------" + '\n')

    else:
        print("Log File Not Found")

    search_again = input('\nDo you want to search for another log ("y" / "n") ?').lower()
    if search_again[0] == 'y':
        print("======================================================\n\n")
        continue

    else:
        log_search_script = False


Solution

  • Your problem is the line:

    if log_file_name in os.listdir(log_path):
    

    This has two problems:

    1. os.listdir will create a huge list which can take a lot of time (and space...).
    2. the ... in ... part will now go over that huge list linearly and search for the file.

    Instead, let your OS do the hard work and "ask for forgivness, not permission". Just assume the file is there and try to open it. If it is not actually there - an error will be raised, which we will catch:

    try:
        with open(log_path + "\\" + log_file_name, 'r', encoding="utf8") as file:
            print(log_file.read())
    except FileNotFoundError:
        print("Log File Not Found")