pythonnlpspacy-3

Spacy Regex "SyntaxError: invalid syntax"


Hi everyone I am executing this code in Spacy to match with Regex, but I get an error:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc1 = nlp("Hello hello hello, how are you?")
doc2 = nlp("Hello, how are you?")
doc3 = nlp("How are you?")
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
matcher.add("greetings",  [pattern])
for mid, start, end in matcher(doc1):
print(start, end, doc1[start:end])

The error is

pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
                                                                                  ^
SyntaxError: invalid syntax

I am following a book called Mastering Spacy and I copy-pasted the code from the book, but I checked not to include any special characters.

Regards


Solution

  • A pattern added to the Matcher consists of a list of dictionaries.

    (from docs). Your code, written more legibly:

    pattern = [
        {
            "LOWER": {"IN": ["hello", "hi", "hallo"]},
            "OP": "*",
            {"IS_PUNCT": True}
        }
    ]
    

    The first dictionary has three entries, but the third entry is malformed: each entry to a dictionary should consist of key: value, but you only have one item, which does not fit dictionary syntax.

    Along those lines,

    Each dictionary describes one token and its attributes.

    Something that, lowercased, is in ["hello", "hi", "hallo"] cannot ever be punctuation. You seem to want to match something like "Hi Hi Hello!", two tokens with the first of them allowing for repetition; this would be matched by something like

    pattern = [
        {
            "LOWER": {"IN": ["hello", "hi", "hallo"]},
            "OP": "*",
        },
        { "IS_PUNCT": True }
    ]