python-2.7cartesian-productradix-tree

Reversing (or simplifying) a Cartesian product?


To make things easier but also more complicated, I tried to implement a concept of "combined / concise tags" that expand further on into multiple basic tag forms.

In this case the tags consist of (one or more) "sub-tag(s)", delimited by semicolons:

food:fruit:apple:sour/sweet

drink:coffee/tea:hot/cold

wall/bike:painted:red/blue

Slashes indicate "sub-tag" interchangeability. Therefore the interpreter translates them to this:

food:fruit:apple:sour
food:fruit:apple:sweet

drink:coffee:hot
drink:coffee:cold
drink:tea:hot
drink:tea:cold

wall:painted:red
wall:painted:blue
bike:painted:red
bike:painted:blue

The code used (not perfect, but works):

import itertools

def slash_split_tag(tag):
    if not '/' in tag:
        return tag
    subtags = tag.split(':')
    pattern, v_pattern = (), ()
    for subtag in subtags:
        if '/' in subtag:
            pattern += (None,)
            v_pattern += (tuple(subtag.split('/')),)
        else:
            pattern += (subtag,)
    def merge_pattern_and_product(pattern, product):
        ret = list(pattern)
        for e in product:
            ret[ret.index(None)] = e
        return ret
    CartesianProduct = tuple(itertools.product(*v_pattern)) # http://stackoverflow.com/a/170248
    return [ ':'.join(merge_pattern_and_product(pattern, product)) for product in CartesianProduct ]

#===============================================================================
# T E S T
#===============================================================================

for tag in slash_split_tag('drink:coffee/tea:hot/cold'):
    print tag
print
for tag in slash_split_tag('A1/A2:B1/B2/B3:C1/C2:D1/D2/D3/D4/EE'):
    print tag

Question: How can I possibly revert this process? I need this for readability reasons.


Solution

  • Here's a simple, first-pass attempt at such a function:

    def compress_list(alist):
        """Compress a list of colon-separated strings into a more compact
        representation.
        """
        components = [ss.split(':') for ss in alist]
    
        # Check that every string in the supplied list has the same number of tags
        tag_counts = [len(cc) for cc in components]
        if len(set(tag_counts)) != 1:
            raise ValueError("Not all of the strings have the same number of tags")
    
        # For each component, gather a list of all the applicable tags. The set
        # at index k of tag_possibilities is all the possibilities for the
        # kth tag
        tag_possibilities = list()
        for tag_idx in range(tag_counts[0]):
            tag_possibilities.append(set(cc[tag_idx] for cc in components))
    
        # Now take the list of tags, and turn them into slash-separated strings
        tag_possibilities_strs = ['/'.join(tt) for tt in tag_possibilities]
    
        # Finally, stitch this together with colons
        return ':'.join(tag_possibilities_strs)
    

    Hopefully the comments are sufficient in explaining how it works. A few caveats, however:

    Testing with the three lists given in the question seems to work, although as I said, the order may be different to the initial strings:

    food:fruit:apple:sweet/sour
    drink:tea/coffee:hot/cold
    wall/bike:painted:blue/red