pythonseleniumxpathlxmllxml.html

Find element by text and print the next/previous sibling


I am trying to scrape the contents of this page, http://targetstudy.com/school/62292/universal-academy/

The concern is that, sometimes the data is in this order Name-Address-Pin-Mobile-etc. And sometimes address is not there, Name-Pin-Mobile

There is no specific class defined, and I am not sure which xpath to use to grab the exact text. I am using Selenium Python.

Can we do something like find element by text and the print the next sibling of parent. Let me give you an example to clarify,

<td>
  <b>Address :</b>
  "  Sri Saadhuraam Parisar, Kosamnara, Kotra Road Raigarh "
  </td>

so is there a way to find element by text "Address :" and let it print the next line " Sri Saadhuraam Parisar, Kosamnara, Kotra Road Raigarh "

Could someone please advise. Thanks in advance.

Here is a part of my code so far,

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.action_chains import ActionChains
import lxml.html
from selenium.common.exceptions import NoSuchElementException

path_to_chromedriver = 'chromedriver.exe'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get('http://targetstudy.com/school/62292/universal-academy/')
stuff = browser.page_source.encode('ascii', 'ignore')
tree = lxml.html.fromstring(stuff)
address1 = tree.xpath("//td[contains(text(), 'Address')]/text()")
print address1

Solution

  • If the address is always in <b> tag, you can use:

    //td[contains(b[1], 'Address')]/child::text()