I have been trying to solve bioinformatics problems from the Rosalind.info website, and I am now facing some trouble when I want to perform some simple testing.
My project is structured in the following way:
Rosalind-problems/
├─ bioinformatics_stronghold/
│ ├─ data/
│ ├─ modules/
│ │ ├─ __init__.py
│ │ ├─ read_fasta.py
│ ├─ CONS.py
│ ├─ IEV.py
├─ tests/
│ ├─ __init__.py
│ ├─ test_CONS.py
│ ├─ test_IEV.py
The goal here is to be able to test all the individual files (CONS.py, IEV.py etc.) in the bioinformatics stronghold folder. The problem that I have encountered however is:
See all the affected file below:
test_IEV.py
import pytest
from bioinformatics_stronghold.IEV import calculate_offspring
def test_calculate_offspring():
assert calculate_offspring([1, 0, 0, 1, 0, 1]) == 3.5
assert calculate_offspring([1, 1, 1, 1, 1, 1]) == 8.5
IEV.py
def calculate_offspring(input_list:list[int]) -> float:
"""This function will take an input list of non-negative integers no larger than 20,000. The function will then calculate the expected offspring showing the dominant phenotype.
Args:
input_list (list): Input a list of integers representing the number of couples
Returns:
float: The expected number of offspring
"""
input_list = input_list
expected_dominant_offspring = 0
# For all cases, it is assumed that all couples will have exactly 2 calculate_offspring
for index, count in enumerate(input_list):
print("Index:", index, " ", "Num couples:", count)
# Case AA-AA, all offspring will be dominant phenotye
if index == 0:
expected_dominant_offspring += count * 2 * 1
# Case AA-Aa, all offspring will be dominant phenotype
elif index == 1:
expected_dominant_offspring += count * 2 * 1
# Case AA-aa, all offspring will be dominant phenotype
elif index == 2:
expected_dominant_offspring += count * 2 * 1
# Case Aa-Aa, 3 out of 4 offspring will be dominant genotype
elif index == 3:
expected_dominant_offspring += count * (2 * (3/4))
# Case Aa-aa, 1 out of 4 offspring will be dominant phenotype
elif index == 4:
expected_dominant_offspring += count * (2 * (2/4))
# Case aa-aa, no offspring will be dominant phenotype
elif index == 5:
expected_dominant_offspring += count * 2 * 0
print(expected_dominant_offspring)
return expected_dominant_offspring
These two works just fine.
Now to the problematic files...
test_CONS.py
import pytest
from bioinformatics_stronghold.CONS import find_consensus_sequence
def test_find_consensus_sequence():
assert find_consensus_sequence("tests\\data\\CONS_sample_data.fasta") == [[5, 1, 0, 0, 5, 5, 0, 0], [0, 0, 1, 4, 2, 0, 6, 1], [1, 1, 6, 3, 0, 1, 0, 0], [1, 5, 0, 0, 0, 1, 1, 6]], ['A', 'T', 'G', 'C', 'A', 'A', 'C', 'T']
Adding the line from bioinformatics_stronghold.modules.read_fasta import read_fasta_file
just gives me an import error ModuleNotFound. Adding . or .. in from results in ImportError: attempted relative import with no known parent package.
CONS.py
from modules.read_fasta import read_fasta_file
def find_consensus_sequence(fasta_location):
"""
This function will read a given fasta file and extract all sequences using the read_fasta.py module.
The function will then create a profile matrix as well as a consensus sequence, both as lists.
Args:
fasta_location (str): The location of the fasta file as a string.
Returns:
profile_matrix (list[lists]): The profile matrix of all given sequences.
consensus_sequence (list): The consensus sequences of all given sequences.
"""
fasta_content = read_fasta_file(fasta_location, debug=False)
# Create a matrix with all sequences
sequence_matrix = []
for item in fasta_content:
sequence_matrix.append(list(item.sequence))
# print(sequence_matrix)
# Create the empty profile matrix
# [A, C, G, T]
profile_matrix = [[0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0])]
# print(profile_matrix)
# Add to the nucleotide count depending on the sequence
for index, sublist in enumerate(sequence_matrix):
for index, nucleotide in enumerate(sublist):
if nucleotide == "A":
profile_matrix[0][index] += 1
if nucleotide == "C":
profile_matrix[1][index] += 1
if nucleotide == "G":
profile_matrix[2][index] += 1
if nucleotide == "T":
profile_matrix[3][index] += 1
# print(profile_matrix)
consensus_sequence = []
# NOTE: Ugly solution, but it seems to work. Quite ineffective, but not sure how to improve at this time.
# For each position in the sequence, check which "letter" is larger than all other
for index in range(len(profile_matrix[0])):
if profile_matrix[0][index] > profile_matrix[1][index] and profile_matrix[0][index] > profile_matrix[2][index] and profile_matrix[0][index] > profile_matrix[3][index]:
consensus_sequence.append("A")
elif profile_matrix[1][index] > profile_matrix[0][index] and profile_matrix[1][index] > profile_matrix[2][index] and profile_matrix[1][index] > profile_matrix[3][index]:
consensus_sequence.append("C")
elif profile_matrix[2][index] > profile_matrix[0][index] and profile_matrix[2][index] > profile_matrix[1][index] and profile_matrix[2][index] > profile_matrix[3][index]:
consensus_sequence.append("G")
elif profile_matrix[3][index] > profile_matrix[0][index] and profile_matrix[3][index] > profile_matrix[1][index] and profile_matrix[3][index] > profile_matrix[2][index]:
consensus_sequence.append("T")
# print(consensus_sequence)
return profile_matrix, consensus_sequence
test_CONS.py just wont work. The problem seems to be that the modules folder cannot be found.
Adding an __init__.py to the bioinformatics_stronghold folder does not solve this problem.
If I move the tests folder into the bioinformatics_stronghold folder, pytest just breaks with no apparent error messages and I cannot setup testing in VSCodium.
My question then is:
I think changing this should do it:
from .modules.read_fasta import read_fasta_file
If that doesn't do it, there's probably some sort of import issue in read_fasta.py
, and I'd encourage you to comment here with the full error traceback you're seeing, rather than just the error message.
Note: Your naming conventions do not follow PEP8 guidelines.
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
Edit: Here's an example on how to structure your project and make it callable.
Rosalind-problems/
├─ bioinformatics_stronghold/
│ ├─ data/
│ ├─ modules/
│ │ ├─ __init__.py
│ │ ├─ read_fasta.py
│ ├─ __main__.py
│ ├─ CONS.py
│ ├─ IEV.py
├─ tests/
│ ├─ __init__.py
│ ├─ test_CONS.py
│ ├─ test_IEV.py
__main__.py
from .modules import read_fasta
read_fasta.call_a_function()
To execute this, you just type python -m bioinformatics_stronghold
in the terminal. With a single main entrypoint, you can do all sorts of things, like accepting user input, adding an argparse
interface, etc.
More reading: https://docs.python.org/3/library/__main__.html