http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235056&expand=true#ct (That's the information I am trying to scrape)
I wanna to scrape this detailed taxonomic trees so that I can manipulate them anyway I like.
But there are a few problem in geting this tree data.
I can' t fully expand the taxonomic tree . when some expanding ,some collapse as the instruction indicated . so saving the full page as html files can not sove my problem. or I can repeat the process some times to get separate files and concatenate them.. but it seems to be a ugly way.
I am tired of clicking , there are so many "plus" signs and I have to wait.
Is there a way to solve this out using Python ?
Use Selenium, this will expand the tree by clicking on the "plus signs" and get the entire DOM with all the elements in it after it's done:
from selenium import webdriver
import time
browser=webdriver.Chrome()
browser.get('http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235301&expand=true#ct')
while True:
try:
elem=browser.find_elements_by_xpath('.//*[@src="http://www.marinespecies.org/images/aphia/pnode.gif" or @src="http://www.marinespecies.org/images/aphia/plastnode.gif"]')[1]
elem.click()
time.sleep(2)
except:
break
content=browser.page_source