In python, I have a list of tuples (lot) with patient data, as shown below:
lot = [('490001', 'A-ARM1', '1', '2', "a", "b"),
('490001', 'A-ARM2', '3', '4', "c", "d"),
('490002', 'B-ARM3', '5', '6', "e", "f")]
In my real dataset, lot consists of 50-150 tuples (dependent on the patient). I loop through every second tuple element and wish to replace every 'A-' and 'B-' characters by a dictionary value, so the output will become:
[('490001', 'ZZARM1', '1', '2', 'a', 'b'), ('490001', 'ZZARM2', '3', '4', 'c', 'd'), ('490002', 'XXARM3', '5', '6', 'e', 'f')]
To satisfy this, I've written the code below. Here, I was wondering if there is a cleaner (shorter) way of writing this. For example, 'lot2'. The code should work optimally for a large list of tuples, as stated above. I'm eager to learn from you!
from more_itertools import grouper
dict = {'A-': 'ZZ', 'B-': 'XX'}
for el1, el2, *rest in lot:
for i, j in grouper(el2, 2):
if i + j in dict:
lot2 = [ ( tpl[0], (tpl[1].replace(tpl[1][:2], dict[tpl[1][:2]])), tpl[2], tpl[3], tpl[4], tpl[5] ) for tpl in lot]
print(lot2)
If you're looking for a shorter code, here's a shorter code that doesn't used more_itertools.grouper
. Basically, iterate over lot
and modify second elements as you go (if it needs to be changed). Note that I named dict
to dct
here; dict
is the builtin dict constructor, naming your variables the same as Python builtins create problems if you happen to want to use the dict constructor later on.
lot2 = []
for el1, el2, *rest in lot:
prefix = el2[:2]
el2 = dct.get(prefix, prefix) + el2[2:]
lot2.append((el1, el2, *rest))
which can be written even more concisely:
lot2 = [(el1, dct.get(el2[:2], el2[:2]) + el2[2:], *rest) for el1, el2, *rest in lot]
Output:
[('490001', 'ZZARM1', '1', '2', 'a', 'b'),
('490001', 'ZZARM2', '3', '4', 'c', 'd'),
('490002', 'XXARM3', '5', '6', 'e', 'f')]