pythonscrapy

scrapy: extract property from a selector


I am sorry for the beginner question but this is driving me crazy:

Imagine I have this selector to a group of span elements:

ori=response.xpath("//div[@class='comparison- 
  row']//div[contains(@class,'modern- 
  translation')]//span[contains(@class,'line-mapping')]")

I need to extract 2 properties, namely the data-id and the text from each span

I do:

for r in ori:
    id_n=r.xpath("@data-id").extract()
    text_n=r.xpath("/text()").extract()
    if len(id_n)!=0 and len(text_n)!=0:
       ids.append(id_n)
       text.append(text_n)

But the following returns an error:

text_n=r.xpath("/text()").extract()

I tried:

for r in ori:
    n=r.extract()
    print(n) 

I have this output:

<span class="line-mapping" data-id="40641-1502046130379-55525"> </span>
<span class="line-mapping" data-id="40641-1501842475891-53929">I'll stay at home and pray for God's blessing in your attempt.</span>
<span class="line-mapping" data-id="40641-1501842481535-22321"> Leave tomorrow, and be sure of this: anything that I can help you with, you shall have. </span>

I need to extract each text


Solution

  • You need to make your xpath expression relative to the current element:

    text_n = r.xpath("./text()").extract()
    

    Also, if len(id_n)!=0 and len(text_n)!=0: is better written as if id_n and text_n: