Use python, AutoGPT and ChatGPT to extract data from downloaded HTML page

Note: If you're downvoting at least share why. I put in a lot of effort to write this question, shared my code and did my own research first, so not sure what else I could add.

I already use Scrapy to crawl websites successfully. I extract specific data from a webpage using CSS selectors. However, it's time consuming to setup and error prone. I want to be able to pass the raw HTML to chatGPT and ask a question like

"Give me in a JSON object format the price, array of photos, description, key features, street address, and zipcode of the object"

Desired output below. I truncated description, key features and photos for legibility.

{
"price":"$945,000",
"photos":"https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/1500x1500/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542874?w=3840&q=75;https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/1500x1500/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542875?w=3840&q=75;https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/1500x1500/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542876?w=3840&q=75",
"description":"<div>This spacious 2 bedroom 1 bath home easily converts to 3 bedrooms. Featuring a BRIGHT and quiet southern exposure, the expansive great room (with 9ft ceilings) is what sets (...)",
"key features":"Center island;Central air;Dining in living room;Dishwasher",
"street address":"170 West 89th Street, 2D",
"zipcode":"NY 10024",
}

Right now I run into the max chat length of 4096 characters. So I decided to send the page in chunks. However even with a simple question like "What is the price of this object?" I'd expect the answer to be "$945,000" but I'm just getting a whole bunch of text. I'm wondering what I'm doing wrong. I heard that AutoGPT offers a new layer of flexibility so was also wondering if that could be a solution here.

My code:

import requests
from bs4 import BeautifulSoup, Comment
import openai
import json

# Set up your OpenAI API key
openai.api_key = "MYKEY"

# Fetch the HTML from the page
url = "https://www.corcoran.com/listing/for-sale/170-west-89th-street-2d-manhattan-ny-10024/22053660/regionId/1"
response = requests.get(url)

# Parse and clean the HTML
soup = BeautifulSoup(response.text, "html.parser")

# Remove unnecessary tags, comments, and scripts
for script in soup(["script", "style"]):
    script.extract()

# for comment in soup.find_all(text=lambda text: isinstance(text, Comment)):
#     comment.extract()

text = soup.get_text(strip=True)

# Divide the cleaned text into chunks of 4096 characters
def chunk_text(text, chunk_size=4096):
    chunks = []
    for i in range(0, len(text), chunk_size):
        chunks.append(text[i:i+chunk_size])
    return chunks

print(text)

text_chunks = chunk_text(text)

# Send text chunks to ChatGPT API and ask for the price
def get_price_from_gpt(text_chunks, question):
    for chunk in text_chunks:
        prompt = f"{question}\n\n{chunk}"
        response = openai.Completion.create(
            engine="text-davinci-002",
            prompt=prompt,
            max_tokens=50,
            n=1,
            stop=None,
            temperature=0.5,
        )

        answer = response.choices[0].text.strip()
        if answer.lower() != "unknown" and len(answer) > 0:
            return answer

    return "Price not found"

question = "What is the price of this object?"
price = get_price_from_gpt(text_chunks, question)
print(price)

Solution

UPDATED ANSWER 06.28.2023

Your question was very interesting, so I wanted to try to improve my previous answer that you have already accepted.

I noted that my previous answer cost around .05 cents to query the OpenAI api. These costs was directly related to the text chunking function and asking the questions in a for loop. I have removed the text chunking function and the for loop because I was able to reduced the tokens to a condensed size.

One of the core items that was required to reduce the cost is text cleaning, which is a standard NLP and data science problem. I added some more code to remove additional unneeded text from the SOUP object. There is a performance hit when doing this, but not enough to lose sleep over.

Refining the query prompt was also needed to submit everything in a single request. Doing this reduces the query costs.

The code below can be refined more. Currently it cost .02 cents per query using text-davinci-003. The prompt will need to reworked to use text-davinci-002, which is a little cheaper than text-davinci-003.

The API query time for the code below can exceed 15 seconds. There are numerous discussions on the community forums at OpenAI about query performance. From my research there is no solid technique on how to improve query performance.

import json
import spacy
import openai
import requests
import re as regex
from bs4 import BeautifulSoup

openai.api_key = 'my_key'


# this code can be refined. 
def remove_unwanted_tags(soup):
    for disclaimer_tag in soup.select('div[class*="Disclaimer__TextContainer"]'):
        disclaimer_tag.decompose()

    for footer_tag in soup.select('div[class*="GlobalFooter__"]'):
        footer_tag.decompose()

    for hidden_tag in soup.select('p[class*="visually-hidden"]'):
        hidden_tag.decompose()

    for agent_contact_card_tag in soup.select('div[class*="AgentContactCard__"]'):
        agent_contact_card_tag.decompose()

    for listing_agent_tag in soup.select('div[class*="DetailsAndAgents__ListingFirmName"]'):
        listing_agent_tag.decompose()

    for property_sale_history_tag in soup.select('section[class*="SalesHistory__"]'):
        property_sale_history_tag.decompose()

    for data in soup(['style', 'script', 'iframe', 'footer', 'h2', 'a']):
        data.decompose()

    dirty_soup = ' '.join(soup.find_all(string=True))
    remove_allcaps_words = regex.sub(r'\b[A-Z]+\b', '', dirty_soup)
    clean_soup = regex.sub("\s\s+", " ", str(remove_allcaps_words))
    return  clean_soup


def tokenize_text(text):
    nlp = spacy.load("en_core_web_sm")
    sentences = nlp(text)
    return [sentence.text for sentence in sentences.sents]


def get_property_details_from_gpt(text, question):
        prompt = f"{question}\n\n{text}"
        response = openai.Completion.create(
            engine="text-davinci-003",
            prompt=prompt,
            max_tokens=200,
            n=1,
            stop=None,
            temperature=0.7,
            frequency_penalty=0,
            presence_penalty=0.6
        )

        answer = response.choices[0].text.strip()
        if answer.lower() != "unknown" and len(answer) > 0:
            return answer

url = "https://www.corcoran.com/listing/for-sale/170-west-89th-street-2d-manhattan-ny-10024/22053660/regionId/1"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
property_photos = [element.find('img').attrs['src'] for element in soup.select('div[class*="carousel-item"]')]

strained_soup = remove_unwanted_tags(soup)
text = u''.join(tokenize_text(strained_soup))

question = """Please extract the following details from the provided items:
{"price": "(1) Exact selling price of this property with dollar sign and thousands formatting.",
"street_address": "(2) Exact street address of this property without state or zip code.",
"description": "(3) Short description of this property.",
"key_features": "(4) Key features of this property. Provide these items in a semicolon delimiter string.",
"state": "(5) State abbreviation for this property.",
"zipcode": "(6) Zip code for this property, if available.",
"photos": ""},
(8) Provide this data back in a python dictionary that can be processed by json.loads"""

query_results = get_property_details_from_gpt(text, question)
query_data = query_results.replace('Answer:', '')
corcoran_properties = json.loads(query_data)
corcoran_properties["photos"] = f"{property_photos}"
corcoran_json = json.dumps(corcoran_properties, indent=4)
print(corcoran_json)

This is the output:

{
    "price": "$945,000",
    "street_address": "170 West 89th Street",
    "description": "This spacious 2 bedroom 1 bath home easily converts to 3 bedrooms. Featuring a and quiet southern exposure, the expansive great room (with 9ft ceilings) is what sets this home apart from others. Paired with a renovated open kitchen, new bathroom, and washer/dryer, this is an Upper West Side gem.",
    "key_features": "Center island; Central air; Dining in living room; Dishwasher; En suite; Excellent light light; Hardwood floors; High ceilings; Modern kitchen; New windows; Open kitchen; Pet friendly; Prewar detail; Storage space; Washer/dryer; Window /",
    "state": "NY",
    "zipcode": "10024",
    "photos": "['https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542874?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542875?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542876?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542877?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542878?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542879?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542880?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542881?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543843?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543845?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543846?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543847?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543848?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543849?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543850?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543851?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543852?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543853?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543854?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543855?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542882?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542883?w=3840&q=75', 'https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/PropertyAPI/NewTaxi/5415/mediarouting.vestahub.com/Media/111409705?w=3840&q=75']"
}

UPDATED ANSWER 06.26.2023

I'm trying to refine this answer. I decided to clean the data slightly before sending it the API. Doing this allowed me to get some cleaner answers. I removed the code from my previous answer, but I left my notes, which I consider important to anyone trying to do something similar.

I found that text-davinci-003 gives more precise answers to the questions than text-davinci-002, but it costs more to use text-davinci-003.

Updated code:

import json
import spacy
import openai
import requests
import re as regex
from bs4 import BeautifulSoup

#####################################################
# The NLP package en_core_web_sm has to be download 
# once to your environment. Use the following command 
# to accomplish this. 
# 
# spacy.cli.download("en_core_web_sm")
#
######################################################

openai.api_key = 'my_key'

# I needed to deep clean the soup by removing some excess text. Doing this allowed for better responses for OpenAI. 
def remove_unwanted_tags(soup):
    for disclaimer_tag in soup.select('div[class*="Disclaimer__Text"]'):
        disclaimer_tag.decompose()

    for footer_tag in soup.select('div[class*="GlobalFooter__Footer"]'):
        footer_tag.decompose()

    for hidden_tag in soup.select('p[class*="visually-hidden"]'):
        hidden_tag.decompose()

    for data in soup(['style', 'script', 'iframe']):
        data.decompose()

    dirty_soup = ' '.join(soup.find_all(string=True))
    
    # I removed all the UPPER case words in the soup, 
    # because they provided no value. 
    clean_soup = regex.sub(r'\b[A-Z]+\b', '', dirty_soup)
    return clean_soup

def tokenize_text(text):
    nlp = spacy.load("en_core_web_sm")
    sentences = nlp(text)
    return [sentence.text for sentence in sentences.sents]

def get_text_chunks(text, max_tokens_per_chunk=3000):
    chunks = []
    current_chunk = []
    current_token_count = 0
    sentences = tokenize_text(text)
    for sentence in sentences:
        current_chunk.append(sentence)
        current_token_count += len(sentence.split(" "))

        if current_token_count >= max_tokens_per_chunk:
            chunks.append(current_chunk)
            current_chunk = []
            current_token_count = 0

    if current_chunk:
        chunks.append(current_chunk)
    return chunks

def get_property_details_from_gpt(text_chunks, question):
    for chunk in text_chunks:
        prompt = f"{question}\n\n{chunk}"
        response = openai.Completion.create(
            engine="text-davinci-003",
            prompt=prompt,
            max_tokens=400,
            n=1,
            stop=None,
            temperature=0.5,
        )

        answer = response.choices[0].text.strip()
        if answer.lower() != "unknown" and len(answer) > 0:
            return answer


url = "https://www.corcoran.com/listing/for-sale/170-west-89th-street-2d-manhattan-ny-10024/22053660/regionId/1"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

questions = {'price': 'Provide only the price the for this property',
             'description': 'Provide a short description about this property?\nOnly provide complete '
                            'sentences. Please correct the spelling for each word provide.',
             'key features':'What are the key features of this property?\nProvide these items in a semicolon delimiter string.',
             'street address': 'What is the exact street address of this propery?\nOnly provide the street address '
                               'and nothing additional.',
             'zipcode': 'What is the state abbreviation and zip code for this property?\nOnly provide the state abbreviation and zipcode.'}

corcoran_properties = {}

strained_soup = remove_unwanted_tags(soup)
text_chunks = get_text_chunks(strained_soup)
for json_key, question in questions.items():
    question_response = get_property_details_from_gpt(text_chunks, question)
    corcoran_properties[json_key] = question_response

property_photos = [element.find('img').attrs['src'] for element in soup.select('div[class*="carousel-item"]')]
corcoran_properties['photos'] = property_photos

corcoran_json = json.dumps(corcoran_properties, indent=4)
print(corcoran_json)

This was the output from the code above:

{
    "price": "$945,000",
    "description": "This property is a 2 bedroom, 1 bathroom Co-op located at 170 West 89th Street in the Upper West Side of Manhattan, New York. Built in 1910, this spacious home features 9ft ceilings, a renovated open kitchen, new bathroom, and washer/dryer. Building amenities include a storage unit, stroller parking and bicycle storage. It has excellent natural light and Southern exposure. The neighborhood is full of iconic architecture, cultural institutions, and historical sites. It is close to Central Park and Riverside Park.",
    "key features": "9ft ceilings; 2 beds; 1 bath; Southern exposure; Renovated open kitchen; New bathroom; Washer/Dryer; Storage unit; Convenient stroller parking; Bicycle storage; Center island; Central air; Dining in living room; Dishwasher; En suite; Excellent light; Hardwood floors; High ceilings; Modern kitchen; New windows; Open kitchen; Pet friendly; Prewar detail; Storage space; Washer/Dryer; Window/Listing agent; Iconic architecture; City-defining structures; Cultural institutions; Historical sites; Central Park; Riverside Park.",
    "street address": "170 West 89th Street",
    "zipcode": "NY, 10024",
    "photos": [
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542874?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542875?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542876?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542877?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542878?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542879?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542880?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542881?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543843?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543845?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543846?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543847?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543848?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543849?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543850?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543851?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543852?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543853?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543854?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134543855?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542882?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/ListingFullAPI/NewTaxi/7625191/mediarouting.vestahub.com/Media/134542883?w=3840&q=75",
        "https://media-cloud.corcoranlabs.com/filters:format(webp)/fit-in/768x768/PropertyAPI/NewTaxi/5415/mediarouting.vestahub.com/Media/111409705?w=3840&q=75"
    ]
}

ORIGINAL ANSWER 06.24.2023 (code removed for readability)

I noted that one of the core issues in your code was in this line:

text = soup.get_text(strip=True)

This line was removing some of the needed spaces that OpenAI chatGPT needed for processing.

Built in 19102 beds1 bath$945,000Maintenance/Common Charges:$1,68010%

Doing this allows for the spaces:

text = u' '.join(soup.find_all(string=True))

Built in 1910 2 beds 1 bath $945,000 Maintenance/Common Charges:

Also the api for OpenAI deals with tokens and not characters, so your chunking code needs to be replace with one that handles tokenization.

I'm unsure of the scalability of this answer, because you will definitely need to think through all the applicable questions related to your data source.

For instance:

What is the address of this property? The address of this property is 170 West 89th Street #2D, New York, NY 10024.

What year was the property built? The property was built in 1910.

What are the maintenance fees for this property? The maintenance fees for this property are $1,680.

** Are there any property amenities?** There are a few property amenities, including a storage unit, stroller parking, and bicycle storage.

One of the core issues with using OpenAI api is extracting a clear description from the text provided.

This line text = ' '.join(soup.find_all(string=True)) produces this text:

170 West 89th Street #2D, New York, NY 10024 Property for sale Skip to main content Sign In Agent Sign in Preferences Open visitors preferences modal BUY RENT SELL NEW DEVELOPMENTS COMMERCIAL SEARCH ALL COMMERCIAL WEXLER HEALTHCARE PROPERTIES AGENTS SEARCH LOCAL AGENTS BROWSE ALL AGENTS BECOME AN AGENT OFFICES SEARCH LOCAL OFFICES SEARCH ALL OFFICES ABOUT US ABOUT CORCORAN CORCORAN LEADERSHIP CORCORAN BRAND BROWSE AFFILIATES BECOME AN AFFILIATE EXPLORE MARKET REPORTS NEIGHBORHOOD GUIDES INHABIT BLOG MEDIA COVERAGE EXCLUSIVE RENTAL BUILDINGS Save share Contact BUY SEARCH in contract WEB ID:    22053660 170 West 89th Street, 2D   Upper West Side,  Manhattan, NY 10024 Upper West Side,  Manhattan, NY 10024 in contract |    Co-op |    Built in 1910     2 beds     1 bath $945,000 Maintenance/Common Charges:     $1,680 10   % Down:     $94,500 Available    Immediately   This is a carousel. Use Next and Previous buttons to navigate. Click on image or "Expand" button to open the fullscreen carousel.
     Not all information is available from these images. Previous Next floorplan map BUY SEARCH in contract WEB ID:    22053660 170 West 89th Street, 2D   Upper West Side,  Manhattan, NY 10024 Upper West Side,  Manhattan, NY 10024 in contract |    Co-op |    Built in 1910     2 beds     1 bath $945,000 Maintenance/Common Charges:     $1,680 10   % Down:     $94,500 Available    Immediately   The Details About  170 West 89th Street, 2D, Upper West Side, Manhattan, NY 10024 COLUMBUS AVENUE and AMSTERDAM AVENUE This spacious 2 bedroom 1 bath home easily converts to 3 bedrooms. Featuring a BRIGHT and quiet southern exposure, the expansive great room (with 9ft ceilings) is what sets this home apart from others. Paired with a renovated open kitchen, new bathroom, and washer/dryer, this is an Upper West Side gem. Building amenities include a storage unit ($25/month), convenient stroller parking and bicycle storage. In... see more Listing Courtesy of    Corcoran   , Stuart    Moss   ,  (212) 821-9140 , RLS data display by Corcoran Group key features   Interior Center island Central air Dining in living room Dishwasher En suite Excellent light light Hardwood floors High ceilings Modern kitchen New windows Open kitchen Pet friendly Prewar detail Storage space Washer/dryer Window A/C Listing agent Stuart Moss Licensed Associate Real Estate Broker Business   :  (212) 821-9140 Mobile   :  (646) 642-0603 Contact me agent-email="SIM@Corcoran.com" Upper West Side Ever wonder why an incalculable number of creative works are set somewhere between 59th and 110th streets, within Central Park West and the Hudson River? All New York City neighborhoods are created equal, but there’s just something about the Upper West Side. Honestly, it’s all in the details: Iconic architecture, city-defining structures like the Dakota, the San Remo, and the El Dorado. Cultural institutions and historical sites of immense international renown line the streets and avenues. Having two beloved greenspaces — Central Park and Riverside Park — at its horizontal edges certainly doesn’t hurt the reputation either. All of it and more is why so many New Yorkers choose to call the UWS home. It’s also why, at times, this neighborhood can feel as much an attitude or mindset as it does a physical place. about the building about the building 170 West 89th Street Apartment Building    in  Upper West Side Columbus Avenue And Amsterdam Avenue 1       UNITS 5       STORIES 1910       BUILT View building details Building Sales History FOR 170 West 89th Street FOR 170 West 89th Street FOR 170 WEST 89TH STREET, 2D Sales History for  170 West 89th Street date unit price approx. sq. ft. beds baths 08/03/2021 3A $985,000 0 2 1 02/28/2020 1B $972,000 1000 3 1 02/19/2020 5A $800,000 0 2 1 See 3 more rows Sales History for  170 West 89th Street, 2D date price listing status 08/12/2016 $925,000 Sold 02/09/2012 $652,500 Sold 08/17/2006 $606,000 Sold All information furnished regarding property for sale, rental or financing is from sources deemed reliable, but no warranty or representation is made as to the accuracy thereof and same is submitted subject to errors, omissions, change of price, rental or other conditions, prior sale, lease or financing or withdrawal without notice. All dimensions are approximate. For exact dimensions, you must hire your own architect or engineer. Images may be digitally enhanced photos, virtually staged photos, artists' renderings of future conditions, or otherwise modified, and therefore may not necessarily reflect actual site conditions. Accordingly, interested parties must confirm actual site conditions for themselves, in person. relocation careers healthcare real estate real estate agents homes for sale homes for rent Copyright ©    2023    The Corcoran Group. All Rights Reserved. Terms & Conditions Privacy Notice Fair Housing Policy Issues DMCA Notice Accessibility Statement Licensing Sitemap Do Not Sell or Share My Personal Information 590 Madison Avenue New York, NY 10022  |  800.544.4055  |  212.355.3550  |  Fax: 212.223.6381  |  info@corcoran.com Corcoran and the Corcoran logos are trademarks of Corcoran Group LLC. The Corcoran® System is comprised of company owned offices which are owned by a subsidiary of Anywhere Real Estate Inc. and franchised offices which are independently owned and operated. The Corcoran System fully supports the principles of the Fair Housing Act and the Equal Opportunity Act. Listing information is deemed reliable, but is not guaranteed. Licensed in the state of California as CA DRE# 02109201

When processing you might get this:

"description": "This property is a 2 bedroom, 1 bathroom co-op in Upper West Side, Manhattan. It features a bright and quiet southern exposure, an expansive great room, and a renovated open kitchen.",

or this:

"description": "This is a carousel. Use Next and Previous buttons to navigate. Click on image or "Expand" button to open the fullscreen carousel.\n\nThis is a 2 bedroom, 1 bath carousel that easily converts to 3 bedrooms. It has a bright and quiet southern exposure, an expansive great room with 9ft ceilings, a renovated open kitchen, new bathroom, and washer/dryer. Building amenities include a storage unit ($25/month), convenient stroller parking and bicycle storage.",

Getting a clear and concise description will require lots of testing. It might require you to do something like this details = ''.join([element.text for element in soup.select('div[class*="DetailsAndAgents"]')])

Doing this also creates an issue with obtaining a clear description.

"description": "errace Gardens is a historic apartment building in the St. Louis Place neighborhood of St. Louis, Missouri. The building was constructed in 1892 and was added to the National Register of Historic Places in 2002. The building is four stories tall and is constructed of brick and limestone. The building features a terracotta cornice and a projecting bay window. The building has a U-shaped floor plan and contains 24 apartments.",

or this:

"description": "otally renovated 3 bedroom, 2.5 bath home with an attached 2 car garage. This home has a brand new kitchen with granite countertops and stainless steel appliances. The bathrooms have also been updated with new vanities and fixtures. There is new flooring and fresh paint throughout the home. The home is located on a cul-de-sac and has a large, fenced in backyard.",