My task is to create combinations, more like a Cartesian product for some attribute lines of a library file. I am currently facing the problem of grouping the same attributes(of course the adjacent parameters are different) as sublists of a list. Remember my input may contain a thousand lines of attributes , which I need to extract from a library file.
######################
Example input:
attr1 apple 1
attr1 banana 2
attr2 grapes 1
attr2 oranges 2
attr3 watermelon 0
######################
Example output:
[['attr1 apple 1','attr1 banana 2'], ['attr2 grapes 1','attr2 oranges 2'], ['attr3 watermelon 0']]
The result I am getting:
['attr1 apple 1','attr1 banana 2', 'attr2 grapes 1','attr2 oranges 2', 'attr3 watermelon 0']
Below is the code:
import re
# regex pattern definition
pattern = re.compile(r'attr\d+')
# Open the file for reading
with open(r"file path") as file:
# Initialize an empty list to store matching lines
matching_lines = []
# reading each line
for line in file:
# regex pattern match
if pattern.search(line):
# matching line append to the list
matching_lines.append(line.strip())
# Grouping the elements based on the regex pattern
#The required list
grouped_elements = []
#Temporary list for sublist grouping
current_group = []
for sentence in matching_lines:
if pattern.search(sentence):
current_group.append(sentence)
else:
if current_group:
grouped_elements.append(current_group)
current_group = [sentence]
if current_group:
grouped_elements.append(current_group)
# Print the grouped elements
for group in grouped_elements:
print(group)
When the file is already sorted, there is a simple solution:
from itertools import groupby
def read_data(filename):
"""Yields one line at a time, skipping empty lines"""
with open(filename) as file:
for line in file:
line = line.strip()
if not line:
continue
yield line
def grouping_key(x):
"Selects the part of the line to use as key for grouping"
return x.split()[0] # The first word
groups = []
for k, g in groupby(read_data("sample.txt"), grouping_key):
groups.append(list(g))
print(groups)