Goal is to match sequence data from fasta information and get the name and the sequences into individual files.
I have two dictionaries that I want to match between dict_1's key and dict_2's values. Then i want to make a file output that is named dict_2's key and inside the file has the corresponding dict_1 items, separated.
Heres the dictionaries:
dict_1 = {'NODE_116_length_11385_cov_7.599029_12': 'DMVDMVDMVDMVDMVDMVDMVVYMMNMETYMMDIIK*',
'NODE_102_length_12880_cov_14.047719_19': 'EIKEEIKEEIKEEIKEEIKEEIELVILNEVKIYLNGKTTILKSKEYLKRMNERTNGNKEELLERLSKLIKIDL*',
'NODE_105_length_12431_cov_10.730204_16': 'FFDYDKDGDLDMILINQSAPEYAKGQIQNLE*',
'NODE_92_length_13700_cov_7.926786_1': 'GKLLLDNLENLKNVDLILMDLHMPIMDGYDCTKKIRKLGYKMPIIASTANAMSGEKEKCLNIGMNDFLLKPVQLKTFKDIIHKWLI*',
'NODE_111_length_11631_cov_12.297685_1': 'GYSQDEQQMANDELKSASKHTEQKILSTIEEVKDDKETKKDIEYELKSTSAIGQHDSLFE*',
'NODE_85_length_14730_cov_9.298399_1': 'HNIHHNIHHNNNNNNNIHHIDHRVFFYTQLQIFFIFHYFIMVHNTQIILIRHAEKKKGTHLSLEGIIRSNELVNFFINQYNPNINIPDIIIAMKQHKKSSNRAFETIQPLANTLNINIIHDFYKNDIKQLHDFIQLHLDKNILICWEHKVLIDITNTITHLKKLFWKKKQYEPIWIINSFNKTFQIFNQFKIINQTIDYSNFKINPIKTLHYN*',
'NODE_56_length_20640_cov_12.217877_21': 'MEEIVNYSKQYGKNQKTEAFEYADNHNLQCFQRDLNESGAKILIVDSYQNIFDSIKNSLNSNYYEYWSSTQPIKFYIDYDNKVENVDQNDLKKRAKGDIISTHKTDILNIINTVRTLIPNITGVNILKSIPDITKKSYHLIFDGIHFANRGILKKFIEDHLKPKFKDLFEKKIIDIKVYGDLCFRTLLSTKSGQNRPLYLLQTDSFLLELQENAISKENTTIEHFLKVSISHIDKDSTLFTYKSEKKKNNSKKVHLMNEDDIYSDKEIVKKYLDLLDGDRYTDYNKWLNIGFILFSINTEYIDLWHYFSNKWEHYDEENCNSKWNTFASSEYVHTINNLIHLAKIDNPDDYEELSKEVPNHDIKYLRPFDNVLSKLIYRIYGEKFVCSNPLKDEWYYFNSIRWKKENKSFNLRHKITNEVFTKIENYRRILIKEGASEEIIKNYHNILQKLGSGIKLNCLEIEFYNEKFYTIIDQNKDLIGFENGIFDLKIMEFRNGVSSDYVSLSTQYDYVYYSPEEPIYKEVSLLISQIIPNPETRHFTMKSLASCLDGHNRDENFYIWSGKNATGGNGKSTITELLSKALGEYAIDSPVSLITGKRESANSANSALASIRNKRVVIMQEPGANEQIQSDVMKSLTGGDKVSTRELNSSQIEFKPHAKIFMACNQIPILSTNDGGTSRRIKIIEFESRFVETPTEGTPVKEFKIDRELKNKLEKYKPVFMSILLDYYKIYIEEKLIPPNSVLKVTKKYESSNNNVKMFIDENIIKGTKTDFIIKEELKVLYRSDISLTRSFPRFSIFVTQFESIFGTEFVFDAKKRLYKFYGYHLKRPGDNSDDENTNNLDNSEDEF*'
'NODE_93_length_13622_cov_12.830766_11': 'IYLNDTTTSGTNGSLIHQNIFRVNQATQNTPVYDSITQTLGNATFTIGMFYKNLSTVKANLNISNAAIRLYRIQ*',
'NODE_124_length_10814_cov_8.548657_12': 'LDANFLDANVLDADFLDANFLDANVLDADFLDADFLERDIVVIFINCK*'}
and
dict_2 = {'MGs12_5k_2_A32': ['NODE_70_length_20145_cov_24.475261_14'],
'MGs12_5k_2_D5': ['NODE_2_length_52708_cov_24.298236_22'],
'MGs12_5k_2_PolB': ['NODE_32_length_24566_cov_24.203541_4'],
'MGs12_5k_2_RNAPL': ['NODE_3_length_51209_cov_24.258005_34',
'NODE_3_length_51209_cov_24.258005_30',
'NODE_3_length_51209_cov_24.258005_32'],
'MGs12_5k_2_RNAPS': ['NODE_50_length_21518_cov_25.799376_1',
'NODE_2_length_52708_cov_24.298236_1'],
'MGs12_5k_2_RNR': ['NODE_7_length_40427_cov_25.036238_31'],
'MGs12_5k_2_SFII': ['NODE_7_length_40427_cov_25.036238_8'],
'MGs12_5k_2_VLTF3': ['NODE_2_length_52708_cov_24.298236_25'],
'MGs12_5k_2_mRNAc': ['NODE_7_length_40427_cov_25.036238_11',
'NODE_50_length_21518_cov_25.799376_17'],
'MGs27_5k_1_A32': ['NODE_116_length_11385_cov_7.599029_5',
'NODE_103_length_12754_cov_11.677455_12'],
'MGs27_5k_1_D5': ['NODE_56_length_20640_cov_12.217877_21',
'NODE_85_length_14730_cov_9.298399_8',
'NODE_86_length_14611_cov_12.522121_7'],
'MGs27_5k_1_PolB': ['NODE_124_length_10814_cov_8.548657_2',
'NODE_65_length_19237_cov_10.992128_2'],
'MGs27_5k_1_SFII': ['NODE_93_length_13622_cov_12.830766_8'],
'MGs27_5k_1_VLTF3': ['NODE_65_length_19237_cov_10.992128_15'],
'MGs27_5k_1_mRNAc': ['NODE_141_length_10084_cov_14.000897_1'],
'MGs27_5k_1_mcp': ['NODE_86_length_14611_cov_12.522121_2',
'NODE_113_length_11459_cov_7.893722_14']}
i tried the following based on these answers >https://stackoverflow.com/questions/53239262/nested-dictionary-from-dict1-and-dict2-using-keys-from-dict1-and-values-from-dic>
https://stackoverflow.com/questions/1317410/finding-matching-keys-in-two-large-dictionaries-and-doing-it-fast> https://stackoverflow.com/questions/32815640/how-to-get-the-difference-between-two-dictionaries-in-python>
for k, v in dict_2.items():
print(k, v)
for v in dict_1.keys():
print(dict_1.values())
I cant get passed confirming the matching and printing the new dict_2.key and dict_1.values. .. In the end I would like filenames names with dict_2 keys in this way:
MGs27_5k_1_D5.txt
>NODE_56_length_20640_cov_12.217877_21
MEEIVNYSKQYGKNQKTEAFEYADNHNLQCFQRDLNESG
AKILIVDSYQNIFDSIKNSLNSNYYEYWSSTQPIKFYID
YDNKVENVDQNDLKKRAKGDIISTHKTDILNIINTVRT...
>NODE_85_length_14730_cov_9.298399_8
MEDFTIAKQYGKNQKVEAFEYAENHNIQCFQKDLNESGAKILIADSY
LNIFNLIKNGMNANYYEYWSSTQQVKFYIDYDNKVENIDFNDLKKRS
KNIDVVSTHKTDLL...
>NODE_86_length_14611_cov_12.522121_10
MKEKFIWEFLDEEWSDLLLS...
(It should be the whole sequence, I used the ... to save space. )
This is the final answer : Thanks to the accepted comment::
def fileWrite(fileName, nodeName, fileContents):
print(f'writing >{nodeName} {fileContents} into {fileName + ".txt"}')
file=open( fileName + ".txt",'w+')
#file.seek(0)
file.write('>'+nodeName+'\n')
file.write(fileContents+'\n')
#file.seek(0)
file.close()
for k2,v2 in dict2.items():
for k1 in dict1:
if k1 in v2:
fileWrite(k2,k1,dict1[k1])
Firstly have a readable values in your dictionaries when presenting to others, just looking at it was a headache , show just the skeleton structure of dictionaries.
Secondly this is a traverse and search problem with dictionaries. just loop through all the keys/values of both dictionary and write a file with the content you need
so here is the final code
dict1={'A':'A1','B':'B1','C':'C1'}
dict2={'F1':['A','G'],'F2':['D'],'F3':['E','I3']}
def fileWrite(fileName,fileContents):
print(f'writing {fileContents} into {fileName + ".txt"}')
file=open( fileName + ".txt",'a+')
#file.seek(0)
file.write(fileContents+'\n')
#file.seek(0)
file.close()
for k2,v2 in dict2.items():
for k1 in dict1:
if k1 in v2:
fileWrite(k2,dict1[k1])