pythonalgorithmrecursiondata-structureshierarchical-data

Algorithm for compound fractions


I have a set of N chemical compounds enumerated 1, 2,..., N. For each compound, I have the fraction of each of its constituents, "A", "B", and so on. Compounds can also contain other compounds, in which case the corresponding fraction is given. For instance, for N = 5, a sample set is

mixes = {
    1: {
        "A": 0.32,
        "B": 0.12,
        "C": 0.15,
        2: 0.41
    },
    2: {
        "C": 0.23,
        "D": 0.12,
        "E": 0.51,
        4: 0.14
    },
    3: {
        "A": 0.24,
        "E": 0.76
    },
    4: {
        "B": 0.13,
        "F": 0.01,
        "H": 0.86
    },
    5: {
        "G": 0.1,
        2: 0.4,
        3: 0.5
    }
}

I would like an algorithm that gives the net fraction of each constituent in every compoound, i.e.

mixes = {
    1: {
        "A": 0.32,
        "B": 0.12 + 0.41 * 0.14 * 0.13,
        "C": 0.15 + 0.41 * 0.23,
        "D": 0.41 * 0.12,
        "E": 0.41 * 0.51,
        "F": 0.41 * 0.14 * 0.01,
        "H": 0.41 * 0.14 * 0.86
    },
    2: {
        "B": 0.14 * 0.13,
        "C": 0.23,
        "D": 0.12,
        "E": 0.51,
        "F": 0.14 * 0.01,
        "H": 0.14 * 0.86
    },
    3: {
        "A": 0.24,
        "E": 0.76
    },
    4: {
        "B": 0.13,
        "F": 0.01,
        "H": 0.86
    },
    5: {
        "A": 0.5 * 0.24,
        "G": 0.1,
        "B": 0.4 * 0.14 * 0.13,
        "C": 0.4 * 0.23,
        "D": 0.4 * 0.12,
        "E": 0.4 * 0.51 + 0.5 * 0.76,
        "F": 0.4 * 0.14 * 0.01,
        "H": 0.4 * 0.14 * 0.86
    }
}

My current approach involves recursion, but I´d like to know if there´s a clever way to do this. Perhaps using a tree-like data structure may help?

EDIT: for simplicity, assume that there´s no cyclic relationships in the dataset.


Solution

  • Here is a recursive solution. I noted the original mix ratios for each compound add to 1 so there is also a check that the result compounds also add to 1:

    from pprint import pprint
    from collections import defaultdict
    
    mixes = {
        1: {
            'A': 0.32,
            'B': 0.12,
            'C': 0.15,
            2: 0.41
        },
        2: {
            'C': 0.23,
            'D': 0.12,
            'E': 0.51,
            4: 0.14
        },
        3: {
            'A': 0.24,
            'E': 0.76
        },
        4: {
            'B': 0.13,
            'F': 0.01,
            'H': 0.86
        },
        5: {
            'G': 0.1,
            2: 0.4,
            3: 0.5
        }
    }
    
    def resolve(compound):
        '''Return the components of a compound, recursively adjusted for ratios.
        '''
        for component, ratio in mixes[compound].items():
            # If the component is another compound, recursively report its ratios.
            if isinstance(component, int):
                for subcomponent, subratio in resolve(component):
                    yield subcomponent, subratio * ratio
            else:
                yield component, ratio
    
    # A dictionary of dictionaries with default 0.0 float values.
    result = defaultdict(lambda: defaultdict(float))
    
    # Resolve each compound and add up component ratios
    for compound in mixes:
        for component, ratio in resolve(compound):
            result[compound][component] += ratio
    
    pprint(result, width=1)
    
    # Checking ratios of result compounds add up to 1
    for compound, components in result.items():
        print(compound, sum(component for component in components.values()))
    

    Output:

    defaultdict(<function <lambda> at 0x00000214CD56FB00>,
                {1: defaultdict(<class 'float'>,
                                {'A': 0.32,
                                 'B': 0.127462,
                                 'C': 0.2443,
                                 'D': 0.049199999999999994,
                                 'E': 0.20909999999999998,
                                 'F': 0.0005740000000000001,
                                 'H': 0.049364}),
                 2: defaultdict(<class 'float'>,
                                {'B': 0.0182,
                                 'C': 0.23,
                                 'D': 0.12,
                                 'E': 0.51,
                                 'F': 0.0014000000000000002,
                                 'H': 0.12040000000000001}),
                 3: defaultdict(<class 'float'>,
                                {'A': 0.24,
                                 'E': 0.76}),
                 4: defaultdict(<class 'float'>,
                                {'B': 0.13,
                                 'F': 0.01,
                                 'H': 0.86}),
                 5: defaultdict(<class 'float'>,
                                {'A': 0.12,
                                 'B': 0.007280000000000001,
                                 'C': 0.09200000000000001,
                                 'D': 0.048,
                                 'E': 0.5840000000000001,
                                 'F': 0.0005600000000000001,
                                 'G': 0.1,
                                 'H': 0.04816000000000001})})
    1 1.0
    2 1.0
    3 1.0
    4 1.0
    5 1.0