I have a dict
with selectors which I use to get data:
for key, selector in selectors.items():
data[key] = response.css(selector).get().strip()
One of the selectors is span::text
, but sometimes the text is wrapped in an additional a
tag. My solution is to make that entry a list including span a::text
:
for key, selector in selectors.items():
if type(selector) == list:
for sel in selector:
data[key] = response.css(sel).get().strip()
if data[key] not in ["", None]: break
else:
data[key] = response.css(selector).get().strip()
Is there a way to change the selector so that it will get the text I want whether there's an a
tag or not? I would like the script to be a single line with .get().strip()
.
Sure you can just use 'span *::text'
.
to Demonstrate:
In [1]: from scrapy.selector import Selector
In [2]: html1 = '<span><a>text contents</a></span>'
In [3]: html2 = '<span>text contents</span>'
In [4]: selector1 = Selector(text=html1)
In [5]: selector2 = Selector(text=html2)
In [6]: selector1.css('span *::text').get().strip()
Out[6]: 'text contents'
In [7]: selector2.css('span *::text').get().strip()
Out[7]: 'text contents'