pythonhtmlselenium-webdriverweb-scrapingselenium-chromedriver

Selenium-Python returning empty text for one specific class or tag in HTML but not visible on webpage


I am using Selenium and Python to scrape a website. The results will be unknown, so I cannot just hardcode to look for specific values (four letter vehicle codes like "UDAR" in this case). I am successfully able to scrape all the data I desire after the page successfully loads, with the exception of one field in the HTML. This field value is not actually visible on the webpage (screenshot 1), but I know of its existence to tag/classify data (screenshot 2 "UDAR") in the HTML.

enter image description here

enter image description here

I have tried many things. I have tried grabbing just one specific element and printing the .text to no avail. I have tried to grab all the elements of this type (which is what I want), and loop through them. I have tried by the tag name, class name, grabbing the higher level "vehicle-item_summary-container" class or even the higher level section, in hopes I could just parse out the data I want (4 letter codes like "UDAR" in this case). All of these options just return an empty string when I am looking for an element, or an array of empty strings if I am looking for elements. I have not tried referencing by xpath because the cars returned on the page are generally out of order, so your xpaths could go like 7, 1, 2, 3, 4, 5, 6, 8 or 9, 1, 2, 3, 4, ..... Did not want to try to get into some hardcore mapping to back into xpaths from other field values I can retrieve.

vehicle_code=driver.find_elements(By.CLASS_NAME, "vehicle-item__tour-info")
vehicle_code=driver.find_elements(By.TAG_NAME, "p")

Solution

  • The text you are looking for is hidden since the style vehicle-item__tour-info contains a property set to display: none, therefore Selenium can't see it.

    So before pulling the text values, we need to change that property:

    driver.execute_script('$(".vehicle-item__tour-info").css("display","block")')
    elem = driver.find_elements(By.CLASS_NAME, 'vehicle-item__tour-info')
    
    tour_info = [i.text for i in elem]
    
    # Output
    ['MVAR', 'ECAR', 'CCAR', 'CFAR', 'IFAR', 'IFDR', 'ICAR', 'ICAE', 'SPAR', 'SCAR', 'SFAR', 'SFDR', 'FCAR', 'PPAR', 'FFAR', 'RFAR', 'FJAR', 'UFAR', 'PFAR', 'IJAR', 'SGAR', 'SGDR', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    

    As you can see, it doesn't get the values for unavailable vehicles. This is easy to fix: just click Explore Alternative Possibilities before pulling the data.

    An alternative approach might be easier. Since you already have all the data in your HTML page source, it's straightforward to get the data using one line regex. This will give you all data, including unavailable vehicles.

    import re
    data = re.findall('(?<=<p class="vehicle-item__tour-info">).+?(?=</p>)', driver.page_source)
    
    # Output
    ['MVAR', 'ECAR', 'CCAR', 'CFAR', 'IFAR', 'IFDR', 'ICAR', 'ICAE', 'SPAR', 'SCAR', 'SFAR', 'SFDR', 'FCAR', 'PPAR', 'FFAR', 'RFAR', 'FJAR', 'UFAR', 'PFAR', 'IJAR', 'SGAR', 'SGDR', 'XXAR', 'FCAH', 'CFAE', 'CFDR', 'PCAR', 'PDAR', 'PXAR', 'LCAR', 'PGAR', 'PGDR', 'FFDR', 'PFDR', 'SKDR', 'RKDR', 'UKDR', 'SPBR', 'PPAE', 'PPBR', 'PPBE', 'SSAR', 'STAR', 'GXAR', 'WXAR', 'UDAR', 'WDAR', 'PQAR', 'OFAR', 'WFAR', 'WFAE', 'SKAR', 'RKAR']