I am collecting protein sequence ids from this website: https://www.uniprot.org/
I've written this code:
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from': 'ID',
'to': 'UPARC',
'format': 'tab',
'query': 'P00766 P40925'
}
data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as f:
response = f.read()
string_it = (response.decode('utf-8'))
print(string_it)
When I print the resulting string:
I get an output that looks like this:
From To
P00766 UPI000011047C
P40925 UPI0000167B3E
How do I convert this to a dictionary?
Basically, just appropriately split and use the values in the string. The code is as follows:
string_list = string_it.split("\n")
string_list = [i for i in string_list if i!=""]
dict_values = {}
for i in string_list[1:]:
dict_values[i.split("\t")[0]] = i.split("\t")[1]
dict_values
The output is:
{'P00766': 'UPI000011047C', 'P40925': 'UPI0000167B3E'}
Code walk through:
From
and To
.\t
the delimiter and add the values into the dictionary.