I am trying to define a regular expression to use as text pattern in the entity ruler component in my spaCy model. The aim is to add tokens with "COMP" label whenever it finds words structured like this:
To do so, I use the following method
def add_component_patterns_re(input_references, model_ruler):
ruler = model_ruler
ref_patterns = []
letters = ['V', 'B', 'F', 'K', 'S']
print("Adding component patterns")
for ref in input_references.iloc[:, 0]:
# print(f"Adding references for system: {ref}")
for letter in letters:
pattern_text = fr'{ref}(-| ){letter}[0-9]{{3}}'
pattern = {"TEXT": {"REGEX": fr'{ref}(-| ){letter}[0-9]{{3}}'}}
ref_patterns.append({"label":"COMP", "pattern":pattern})
ruler.add_patterns(ref_patterns)
return ref_patterns
Printing out the added patterns, it seems to me that the output list is correct. So my guess is that I am doing something wrong when defining the pattern to add to the ruler. For information, i've also tried to change the pattern variable as a list entry, like this:
pattern = [{"TEXT": {"REGEX": fr'{ref}(-| ){letter}[0-9]{{3}}'}}]
But the result is the same, it can't seem to get any match.
Does someone have any suggestion? Thanks in advance!
In the end I got
print(f"Adding references for system: {ref}")
for letter in letters:
for nnn in range(1000):
pattern = f"{ref}-{letter}{nnn:03d}"
ref_patterns.append({"label": "COMP", "pattern": pattern})
pattern = f"{ref} {letter}{nnn:03d}"
ref_patterns.append({"label": "COMP", "pattern": pattern})
For each pattern. The code is lengthier and a tad slower but it does the job just fine!