I have some data that looks something like this:
ID1 ID2 ID3
ID1 ID4 ID5
ID3 ID5 ID7 ID6
...
...
where each row is a group.
My goal is to have a dictionary for each ID, followed by a set of the other IDs that share >= 1 group with it.
For example, this data would return {ID1: [ID2, ID3, ID4, ID5], ID2:[ID1, ID3] ... }
I can think of 3 options for this, and I'm wondering which is (generally) best:
TL;DR: Go with option 2. Just use sets from the start.
In Python, sets are hash-sets, and lists are dynamic arrays. Inserting is O(1)
for both, but checking if an element exists is O(n)
for the list and O(1)
for the set.
So option 1 is immediately out. If you are inserting n
items and need to check the list every time, then the overall complexity becomes O(n^2)
.
Options 2 and 3 are both optimal at O(n)
overall. Option 2 might be faster in micro-benchnarks because you don't need to move objects between collections. In practice, choose the option that is easier to read and maintain in your specific circumstance.