I have a dictionary containing transactions like so:
transactions = {
"T1": ["A", "B", "C", "E"],
"T2": ["A", "D", "E"],
"T3": ["B", "C", "E"],
"T4": ["B", "C", "D", "E"],
"T5": ["B", "D", "E"]
}
I then have an items list as so:
items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]
and what I am trying to figure out, is how I can calculate the number of occurrences these items have in the transactions dictionary. For example in this scenario, I would be returning a dictionary that would look something like:
{('B','C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}
I have the following function:
def get_num_occurrences(items, transactions):
occurr = dict()
for x in items:
occurr[x] = 0
for transaction in transactions.values():
for item in transaction:
occurr[item] += 1
return occurr
This works for 1 item-itemsets (if the list of items was instead items = ["A", "B", "C", "D", "E"]). But I cannot figure out how to implement this same method for 2 item-itemsets (items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]) or if I then had 3 item-itemsets (items = [('B', 'C', 'D'), ('C', 'D', 'E'), ('A', 'C', 'E')]) etc...
Use sets to determine if your item is a subset of the transaction.
transactions = {
"T1": ["A", "B", "C", "E"],
"T2": ["A", "D", "E"],
"T3": ["B", "C", "E"],
"T4": ["B", "C", "D", "E"],
"T5": ["B", "D", "E"]
}
items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]
result = {}
for item in items:
count = 0
for transaction in transactions.values():
if set(item).issubset(set(transaction)):
count += 1
result[item] = count
print(result)
The result is {('B', 'C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}
.
With a dictionary comprehension you can write all of this in one line.
result = {item: sum(set(item).issubset(set(t)) for t in transactions.values()) for item in items}