pythonstringsearchbioinformaticsstring-search

String search with dynamic lengths against a file


The task I am trying to perform is: e.g. I have a string "Simultaneously", I would like to perform a search of this string with dynamic lengths against strings in a file (contains some random sentences) and produce the number of occurrence for each matched string. The operation will be like

Search

Simltaneously
Simtaneously
Simaneously
Simneously
Simeously
Simously
Simusly
Simsly

respectively against the file containing random strings

Tiled say decay **Simeously** spoil now walls meant house. 
My mr interest thoughts **Simeously** screened of outweigh removing. 
Lose hill well up will he over on. 
Increasing **Simeously** sufficient everything men him admiration unpleasing. 
Around really his use uneasy longer him man. 
His our **Simeously** pulled nature elinor talked now for excuse result. 
Wicket longer admire do barton vanity itself do in it. 
Preferred to **Simaneously** sportsmen it engrossed listening. 
Park gate sell they west hard for the. 
Up colonel so between removed **Simtaneously** so do. 
Years use place decay sex worth drift age. 
Men lasting out end **Simtaneously** article express fortune demands own charmed.

The expected outcome should be


matched string
matched line
occurrence

for example

Simaneously
Preferred to Simaneously sportsmen it engrossed listening. 
Occurrence of Simaneously is: 1

Simeously
Tiled say decay Simeously spoil now walls meant house. 
My mr interest thoughts Simeously screened of outweigh removing. 
Increasing Simeously sufficient everything men him admiration unpleasing. 
His our Simeously pulled nature elinor talked now for excuse result. 
Occurrence of Simeously is: 4

Simtaneously
Up colonel so between removed Simtaneously so do. 
Men lasting out end Simtaneously article express fortune demands own charmed.
Occurrence of Simtaneously is: 2

The code I have tried is

word = "Simultaneously"
count = 0
res_word = ""

with open("Sequences2.txt", 'r') as seq:
    lines = seq.readlines()
    for i in range(3, len(word)-3):
        res_word = "Sim" + word[i+1:]
        for line in lines:
            if res_word in line:
                count = count + 1
                print(line)
                print("Occurrence of", res_word, "is:", count)
                print(res_word)
        i = i + 1

However, the current output neither conforms the expected format nor returns correct occurrence. how can I get it righ?

Current Output

Simtaneously
Up colonel so between removed Simtaneously so do. 

Occurrence of Simtaneously is: 1
Simtaneously
Men lasting out end Simtaneously article express fortune demands own charmed.
Occurrence of Simtaneously is: 2
Simaneously
Preferred to Simaneously sportsmen it engrossed listening. 

Occurrence of Simaneously is: 3
Simeously
Tiled say decay Simeously spoil now walls meant house. 

Occurrence of Simeously is: 4
Simeously
My mr interest thoughts Simeously screened of outweigh removing. 

Occurrence of Simeously is: 5
Simeously
Increasing Simeously sufficient everything men him admiration unpleasing. 

Occurrence of Simeously is: 6
Simeously
His our Simeously pulled nature elinor talked now for excuse result. 

Occurrence of Simeously is: 7


Solution

  • Counting occurences using python re package (or regex).

    Here just outputting every word and number of occurences

    import re
    
    word = "Simultaneously"
    
    with open("Sequences2.txt", 'r') as seq:
        file = seq.read()
        for i in range(8):
            word = word[:3]+word[4:]
            count = len(re.findall(word, file))
            print(f"Occurrence of {word} is: {count}")
    

    And here outputting like the from you expected outcome for your script with outputting lines that contain the searched word.

    import re
    
    word = "Simultaneously"
    
    with open("Sequences2.txt", 'r') as seq:
        file = seq.read()
        for i in range(8):
            word = word[:3]+word[4:]
            count = len(re.findall(word, file))
            if count == 0:
                continue
            print(word)
            for line in file.splitlines():
                if word in line:
                    print(line)
            
            print(f"Occurrence of {word} is: {count}")
            print()