pythonscrapyscrapy-shell

Scrapy, get a href from inside a H3 tag?


Currently trying to scrape the link and title from the following piece of HTML and cannot seem to find any way of doing it despite reading the scrapy docs for a while.

<h3 class="data"> 
  <a href="example.com" title="uniqueTitle"></a>
</h3>

Whats the best way of doing this? Also I should note that there are many of these <h3> elements on the page with the same class but different <a> tags that I want to scrape.
Thanks in advance!


Solution

  • To get all the url within the h3 tags, you can use e.g

    from scrapy import Selector
    sel = Selector(text='''<h3 class="data"> 
      <a href="example.com" title="uniqueTitle"></a>
    </h3>''')
    print(sel.css('h3.data > a::attr(href)').extract()) # you can use this
    

    Output:

    ['example.com']