pythonpython-3.xweb-scrapingbeautifulsoupmozilla

How to get the bedroom square footage and prices from Zillow?


I asked this question below to ChatGPT today

User python mozlla headers scrape bedrooms square footage and price from zillow

I got this.

import requests
from bs4 import BeautifulSoup

# Set the URL of the Zillow page you want to scrape
url = "https://www.zillow.com/breckenridge-co/sold/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22Breckenridge%2C%20CO%22%2C%22mapBounds%22%3A%7B%22west%22%3A-106.12936606713866%2C%22east%22%3A-105.9443151027832%2C%22south%22%3A39.45416110834031%2C%22north%22%3A39.497489978187815%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A14934%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22days%22%7D%2C%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22rs%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D"

# Send an HTTP GET request to the URL
response = requests.get(url)

# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")

# Find the relevant elements containing the data you want to scrape
results = soup.find_all("article", class_="property-card-data")


# Loop through each result and extract the desired information
for result in results:
    # Extract the bedroom information
    bedrooms = result.find("ul", class_="list-card-details").find("li").text.strip()

    # Extract the square footage
    square_footage = result.find("ul", class_="list-card-details").find_all("li")[1].text.strip()

    # Extract the price
    price = result.find("div", class_="list-card-price").text.strip()

    # Print the scraped data
    print("Bedrooms:", bedrooms)
    print("Square Footage:", square_footage)
    print("Price:", price)
    print()
    

The problem is, that nothing gets returned. I think the issue is with the 'soup.find_all' or the 'class_='. How does this work exactly?


Solution

  • This is on Zillow's website in the FAQ section

    Q: Can we retrieve and store Zillow data?

    A: No. You may use the API only to retrieve and display dynamic content from Zillow. You are not permitted to store information locally.

    Q: Instead of using the API, can we reverse engineer a data feed or manually pull information from Zillow?

    A: No. We permit third parties to retrieve data from our site only through the API. Any reverse engineering, spiders, or other techniques used to directly pull data without using the Zillow API is a violation of our Terms of Use.

    And the response that comes from connecting to that URL using your code is a captcha wanting to validate a human is performing the search.