pythonspacyspacy-3

Spacy - Span that completely lie within another Span


I have docs in spacy that use spans, such as:

sent = 'I eat 5 apples and 2 bananas.'
doc = nlp(sent)

doc.spans['sc'] = [
   Span(doc, 2, 3, 'Ingredient'),
   Span(doc, 5, 6, 'Ingredient'),
   Span(doc, 2, 6, 'Meal')]

How can I iterate over all spans with the label 'Meal' and show the spans that lie completely within the boundries of those span(s)? I know there is something for ents within spans. But that is not what I'm looking for.


Solution

  • spaCy's SpanGroup object has a useful has_overlap property that can help you with an initial check. Then, you can use a simple straightforward approach by writing a couple of loops or list comprehensions to search within your defined spans using the .start and .end properties.

    Here's how I would write a snippet to handle such a task:

    import spacy
    from spacy.tokens import Span
    
    nlp = spacy.load('en_core_web_sm')
    
    sent = 'I eat 5 apples and 2 bananas.'
    doc = nlp(sent)
    
    doc.spans['sc'] = [
        Span(doc, 0, 1, 'Subject'),
        Span(doc, 1, 2, 'Verb'),
        Span(doc, 3, 4, 'Ingredient'),
        Span(doc, 6, 7, 'Ingredient'),
        Span(doc, 2, 7, 'Meal')]
    
    if doc.spans['sc'].has_overlap:
        meal_start_ends = [(span.start, span.end) for span in doc.spans['sc'] if span.label_ == 'Meal']
        meal_ingredients = [[ig for ig in doc.spans['sc'] if ig.start >= meal[0] and ig.end <= meal[1] and ig.label_=='Ingredient'] for meal in meal_start_ends]
        print(meal_ingredients)
    

    This little snippet should print out [[apples, bananas]], which is hopefully what you wanted to achieve.