pythonintellij-idean-gramfile-search

Python IntelliJ style 'search everywhere' algorithm


I have a list of file names in python like this:

HelloWorld.csv
hello_windsor.pdf
some_file_i_need.jpg
san_fransisco.png
Another.file.txt
A file name.rar

I am looking for an IntelliJ style search algorithm where you can enter whole words or simply the first letter of each word in the file name, or a combination of both. Example searches:

hw -> HelloWorld.csv, hello_windsor.pdf
hwor -> HelloWorld.csv
winds -> hello_windsor.pdf

sf -> some_file_i_need.jpg, san_francisco.png
sfin -> some_file_i_need.jpg
file need -> some_file_i_need.jpg
sfr -> san_francisco.png

file -> some_file_i_need.jpg, Another.file.txt, A file name.rar
file another -> Another.file.txt
fnrar -> A file name.rar

You get the idea.

Is there any Python packages that can do this? Ideally they'd also rank matches by 'frecency' (how often the files have been accessed, how recently) as well as by how strong the match is.

I know pylucene is one option but it seems very heavyweight given the list of file names is short and I have no interest in searching the contents of the file? Is there any other options?


Solution

  • You can do this by using the regular expression (import re) in the python and creating the function. This is bit complex but is achievable using regular expression.

    import re
    def intellij_search(search_term, file_list):
        words = search_term.split()
    
        #empty list for storing name
        matching_files = []
        for file_name in file_list:
            # Initialize a variable to keep track.
            matches_all_words = True
    
            #Iterate over each word in the search term
            for word in words:
                # Create a regular expression pattern
                pattern = '.*'.join(word)
    
                # Check if the file name matches the pattern
                if not re.search(pattern, file_name, re.IGNORECASE):
                    # If the file name does not match the pattern, set the 
                    #variable to False and break the loop
                    matches_all_words = False
                    break
    
            # If the file name matches all words in the search term, add it to 
            #the list of matching file name
            if matches_all_words:
                matching_files.append(file_name)
    
        # Return the matche file
        return matching_files
    
    
    files = ['HelloWorld.csv', 'hello_windsor.pdf', 'some_file_i_need.jpg', 'san_francisco.png', 'Another.file.txt', 'A file name.rar']
    #print(intellij_search('hw', files)) 
    #print(intellij_search('sf', files))
    #print(intellij_search('Afn', files))
    

    I am not sure if you are looking for something like this or else.