pythonscrapyselector

Scrapy: select last decendant node?


I have a dict with selectors which I use to get data:

for key, selector in selectors.items():
    data[key] = response.css(selector).get().strip()

One of the selectors is span::text, but sometimes the text is wrapped in an additional a tag. My solution is to make that entry a list including span a::text:

for key, selector in selectors.items():
    if type(selector) == list:
        for sel in selector:
            data[key] = response.css(sel).get().strip()
            if data[key] not in ["", None]: break
    else:
        data[key] = response.css(selector).get().strip()

Is there a way to change the selector so that it will get the text I want whether there's an a tag or not? I would like the script to be a single line with .get().strip().


Solution

  • Sure you can just use 'span *::text'.

    to Demonstrate:

    In [1]: from scrapy.selector import Selector
    
    In [2]: html1 = '<span><a>text contents</a></span>'
    
    In [3]: html2 = '<span>text contents</span>'
    
    In [4]: selector1 = Selector(text=html1)
    
    In [5]: selector2 = Selector(text=html2)
    
    In [6]: selector1.css('span *::text').get().strip()
    Out[6]: 'text contents'
    
    In [7]: selector2.css('span *::text').get().strip()
    Out[7]: 'text contents'