I am trying to scrape the contents of this page, http://targetstudy.com/school/62292/universal-academy/
The concern is that, sometimes the data is in this order Name-Address-Pin-Mobile-etc. And sometimes address is not there, Name-Pin-Mobile
There is no specific class defined, and I am not sure which xpath to use to grab the exact text. I am using Selenium Python.
Can we do something like find element by text and the print the next sibling of parent. Let me give you an example to clarify,
<td>
<b>Address :</b>
" Sri Saadhuraam Parisar, Kosamnara, Kotra Road Raigarh "
</td>
so is there a way to find element by text "Address :" and let it print the next line " Sri Saadhuraam Parisar, Kosamnara, Kotra Road Raigarh "
Could someone please advise. Thanks in advance.
Here is a part of my code so far,
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.action_chains import ActionChains
import lxml.html
from selenium.common.exceptions import NoSuchElementException
path_to_chromedriver = 'chromedriver.exe'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get('http://targetstudy.com/school/62292/universal-academy/')
stuff = browser.page_source.encode('ascii', 'ignore')
tree = lxml.html.fromstring(stuff)
address1 = tree.xpath("//td[contains(text(), 'Address')]/text()")
print address1
If the address is always in <b>
tag, you can use:
//td[contains(b[1], 'Address')]/child::text()