I'm fairly new to web scraping and may have gone over my head on this, but I am trying to scrape apartment information from a dynamically generated website( https://noveatknox.com/floorplans/). I've gotten as far as being able to scrape the information I need on the "generic" url (it defaults to the 19th floor). I am trying to have selenium click on each floor so I can pull the available units and their information. I've even isolated the link to "click" on each floor. However the inner HTML code always returns a "no apartment available" and therefore cannot find any information.
I sense there is something wrong with the "click" for the page to load no apartments. The dynamic html code makes it very hard to pull and I cannot find a spot which houses all the information.
Here's what I have so far (a[11] refers to a specific floor which I know has available apartments). I plan to apply a range to loop through all floors once I nail down the base code:
xpath = '//*[@id="mobile-floor-carousel-list"]/a[11]'
# Click page
floor = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, xpath))).click()
print("clicked on page")
floorplans = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.XPATH, "//div[@id='unit-list-items']"))
)
print("waited for page to load")
print(driver.find_element(By.XPATH,"//html").get_attribute('innerHTML'))
The hyperlink is shown as below:
<a href="#" class="skylease__toolbar-floor-link swiper-slide skylease__mobile-floor-link--has-units" data-js-hook="mobile-floor-selector" data-floor="14" style="width: 50.6154px; margin-right: 5px;"><div><span>Floor</span><span>14</span><span class="skylease-avail-count"><span data-floor="14" data-js-hook="available-unit-count">4</span> avail</span></div></a>
The inner HTML only shows this when there should be apartment info after:
</div>
<div id="unit-list" class="skylease__unit-list">
<p id="unit-list-message" class="skylease__unit-message">There are no available apartments on this floor.</p>
<div class="skylease__unit-list-items" id="unit-list-items"></div>
</div>
It should look something like this (what I see when I manually do this):
<div id="unit-list" class="skylease__unit-list">
<p id="unit-list-message" class="skylease__unit-message" style="display: none;">There are no available apartments on this floor.</p>
<div class="skylease__unit-list-items" id="unit-list-items"><a href="#" class="skylease__unit-list-item skylease__unit-list-item--alt" data-unit="1504" style="display: block;">
....edited out for simplicity
<div class="skylease__unit-list-item-info-wrap">
<div class="skylease__unit-list-item-info">
<div class="skylease__unit-list-item-details skylease__unit-list-item-details--unit">
#1504
...edited out for simplicity
<div class="skylease__unit-list-item-info">
<p class="skylease__unit-list-item-details">Studio</p>
<p class="skylease__unit-list-item-details">1 bath</p>
<p class="skylease__unit-list-item-details"></p>
<p class="skylease__unit-list-item-details">512 sq. ft.</p>
<p class="skylease__unit-list-item-details"></p>
</div>
<div class="skylease__unit-list-item-info">
<p class="skylease__unit-list-item-details skylease__unit-list-item-details--price">
<span>1,807</span>
<span>3683</span>
</p>
</div>
</div>
Here's a simpler way to do this.
The working code is below.
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
url = 'https://noveatknox.com/floorplans/'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
floors = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#mobile-floor-carousel-list > a span[data-js-hook]")))
for floor in floors:
if floor.text != "0":
print("Floor: " + floor.get_attribute("data-floor") + ", Available units: " + floor.text)
floor.click()
for unit in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#unit-list-items a"))):
print(unit.text)
print("")
print("")
and it outputs...
Floor: 4, Available units: 1
#418
Studio
1 bath
605 sq. ft.
1,801
3815
Floor: 5, Available units: 2
#503
2 bed
2 bath
1166 sq. ft.
3,826
7896
and so on...