pythondictionaryapriori

Python Dictionary: Checking if items in a list are present in any of the lists in the dictionary


I have a dictionary containing transactions like so:

transactions = {
   "T1": ["A", "B", "C", "E"],
    "T2": ["A", "D", "E"],
    "T3": ["B", "C", "E"],
    "T4": ["B", "C", "D", "E"],
    "T5": ["B", "D", "E"]
}

I then have an items list as so:

items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]

and what I am trying to figure out, is how I can calculate the number of occurrences these items have in the transactions dictionary. For example in this scenario, I would be returning a dictionary that would look something like:

{('B','C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}

I have the following function:

def get_num_occurrences(items, transactions):
    occurr = dict()
    for x in items:
        occurr[x] = 0
    for transaction in transactions.values():
        for item in transaction:
            occurr[item] += 1
    return occurr

This works for 1 item-itemsets (if the list of items was instead items = ["A", "B", "C", "D", "E"]). But I cannot figure out how to implement this same method for 2 item-itemsets (items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]) or if I then had 3 item-itemsets (items = [('B', 'C', 'D'), ('C', 'D', 'E'), ('A', 'C', 'E')]) etc...


Solution

  • Use sets to determine if your item is a subset of the transaction.

    transactions = {
       "T1": ["A", "B", "C", "E"],
        "T2": ["A", "D", "E"],
        "T3": ["B", "C", "E"],
        "T4": ["B", "C", "D", "E"],
        "T5": ["B", "D", "E"]
    }
    
    items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]
    
    result = {}
    for item in items:
        count = 0
        for transaction in transactions.values():
            if set(item).issubset(set(transaction)):
                count += 1
        result[item] = count
    
    print(result)
    

    The result is {('B', 'C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}.


    With a dictionary comprehension you can write all of this in one line.

    result = {item: sum(set(item).issubset(set(t)) for t in transactions.values()) for item in items}