pythonlistcombinationscartesian-productsublist

Facing trouble in creating sublists of a list


My task is to create combinations, more like a Cartesian product for some attribute lines of a library file. I am currently facing the problem of grouping the same attributes(of course the adjacent parameters are different) as sublists of a list. Remember my input may contain a thousand lines of attributes , which I need to extract from a library file.

######################

Example input:

attr1 apple 1                                                          
attr1 banana 2

attr2 grapes 1                                   
attr2 oranges 2

attr3 watermelon 0

######################

Example output:

[['attr1 apple 1','attr1 banana 2'], ['attr2 grapes 1','attr2 oranges 2'], ['attr3 watermelon 0']]

The result I am getting:

['attr1 apple 1','attr1 banana 2', 'attr2 grapes 1','attr2 oranges 2', 'attr3 watermelon 0']

Below is the code:

import re

# regex pattern definition
pattern = re.compile(r'attr\d+')

# Open the file for reading
with open(r"file path") as file:
    # Initialize an empty list to store matching lines
    matching_lines = []

    # reading each line 
    for line in file:
        # regex pattern match
        if pattern.search(line):
            # matching line append to the list
            matching_lines.append(line.strip())

# Grouping the  elements based on the regex pattern

#The required list
grouped_elements = []

#Temporary list for sublist grouping
current_group = []

for sentence in matching_lines:
    if pattern.search(sentence):
        current_group.append(sentence)
    else:
        if current_group:
            grouped_elements.append(current_group)
        current_group = [sentence]

if current_group:
    grouped_elements.append(current_group)

# Print the grouped elements
for group in grouped_elements:
    print(group)


Solution

  • When the file is already sorted, there is a simple solution:

    from itertools import groupby
    
    def read_data(filename):
        """Yields one line at a time, skipping empty lines"""
        with open(filename) as file:
            for line in file:
                line = line.strip()
                if not line:
                    continue
                yield line      
    
    def grouping_key(x):
        "Selects the part of the line to use as key for grouping"
        return x.split()[0]   # The first word
    
    groups = []
    for k, g in groupby(read_data("sample.txt"), grouping_key):
        groups.append(list(g))
    
    print(groups)