pythonhttpcurlpython-requeststimeout

GET request times for apartments.com, but website is not down


I am trying to make a GET requests to https://apartments.com. However, the request is just timing out or hanging forever. The page loads just fine in the browser, so the web server isn't just down.

I'm a web developer, so I've worked with the requests module a lot in the past as well as my own web servers, and it was my understanding that unless the server is down, it should always be returning something. If I'm just missing a specific header or cookie I'd expect at least a 403 error.

Other websites work perfectly fine, and I run into this issue on different machines on different networks, so the issue probably isn't with my firewall or connection or anything. The error occurs with both Python and curl. How can I fix this?

My Python code is as follows:

import requests

url = "https://www.apartments.com/" # Any path on this domain has this issue.
url = "https://www.apartments.com/"
escaped_time_string = datetime.datetime.now().strftime("%a+%b+%d+%Y+%H___%M___%S+GMT-0600+(Mountain+Daylight+Time)").replace("___", "%3A")
print(escaped_time_string)
headers = { # These are the headers that were sent by the browser when I visited the website. I have tried other values for User-Agent, but nothing changes.
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0",
    "Host": "www.apartments.com",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Connection": "keep-alive",
    "Cookie": f"cb=1; cul=en-US; ab=%7b%22e%22%3atrue%2c%22r%22%3a%5b%5d%7d; afe=%7b%22e%22%3afalse%7d; fso=%7b%22e%22%3afalse%7d; sr=%7B%22Width%22%3A2560%2C%22Height%22%3A1279%2C%22PixelRatio%22%3A1%7D; _ga=GA1.2.1587172952.1719086370; _gid=GA1.2.2141307871.1719086370; OptanonConsent=isGpcEnabled=0&datestamp={escaped_time_string}&version=202401.2.0&browserGpcFlag=0&isIABGlobal=false&hosts=&consentId=7bc5ce4e-ac96-493a-bcbb-c4a1f2084786&interactionCount=1&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CC0002%3A1%2CC0004%3A1&AwaitingReconsent=false; gip=%7b%22Display%22%3a%22Orem%2c+UT%22%2c%22GeographyType%22%3a2%2c%22Address%22%3a%7b%22City%22%3a%22Orem%22%2c%22CountryCode%22%3a%22US%22%2c%22State%22%3a%22UT%22%7d%2c%22Location%22%3a%7b%22Latitude%22%3a40.3142%2c%22Longitude%22%3a-111.7099%7d%2c%22IsPmcSearchByCityState%22%3afalse%7d; akaalb_www_apartments_com_main=1719122615~op=apartments_Prd_Edge_US:www_apartments_com_LAX|~rv=77~m=www_apartments_com_LAX:0|~os=0847b47fe1c72dfaedb786f1e8b4b630~id=6fe4004a168575354c8284d8b9dcdec9; ak_bmsc=58307435F65C23FE0EBD31E15CF76945~000000000000000000000000000000~YAAQRqfLF+IFxjaQAQAApWx6Qxji3KakC+xNTdsIM+85gJPMwEo498Se+B3+ugjIZV6wiX2p63okc8LKIQYXV3UJjNcRZe05q8LM3FXGCivYqWKHk1795v1Ismu17ai6hO2NRmiHUnW7LM9WHwZsYyJFqGGeRfs9K2JX3abwcljSCXK55n2XBmdryImrz93faWCWIJqy3fCyQGGJoh8iScFiiegqL4zJg14yojxdLOBSJsVXPBnH0F2uLcs5rNpkgGZ/88uFKf66BOU340ir2Yr9QNi3CV5+90STD0hHITUhbIoG7l5Oc7991FZYZFUrj1IWT9vxnwyJOk76yYqdkN5oRuT3GO1WsPZDXp/7sh6e9gSDsAcQRdhIO3eLP4p6A5fkzc5hqogez5F01U4=; _gat=1; bm_sv=F96ACBC1FDF433C48190DCBABDE1376B~YAAQRqfLFx0GxjaQAQAAdm96QxjvIM6idZiwJvdJ3swWH5GcroUfVp9H3HRvvKzoNhUaYrtPnpUOjnfNVrLvVCN+PZNmaO/efw9Id7eOmJ9nDqVBI7g4q1+S4Y7YYd9mmE1wXAy6fgIEBW4I0aD2vfMXXs768jB4P2D26X94iUlSYjZyWSDqVP3p2oBHV+/TUUwU91FAJMXON8j+hMPXaWF74ogpw4kEWWKme2ireco1yRfqVEeRxXShohFdbhheFWgbwg==~1",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Priority": "u=1",
    "TE": "trailers"
}

req = requests.get(url, timeout=10, allow_redirects=True, headers=headers)

req = None
try:
    req = requests.get(url, timeout=10, allow_redirects=True)
except requests.exceptions.Timeout:
    print("Timed out")

if req:
    print(req.text)

This curl command yields the same results:

curl https://apartments.com

Most headers are constant every time my browser accesses the page, but the cookie header has parts that changes every time. One of them includes the current time, which is an easy fix. But it also includes values for bm_sv and bm_mi. I don't have much experience with cookies but they seem to include location data so they might be the issue.


Solution

  • Sometimes portals check header User-Agent to check if it web browser,
    or to send different content for different browsers or different devices (desktop, table, phone). Some send something even if program sends wrong User-Agent but some pages block access.

    requests sends something like python/requests ... as User-Agent

    Sometimes can help even fake Mozilla/5.0 but this page needed value from real web browser. If you visit page http://httpbin.org/get then you can see what your browser sends to server and you can use it.

    I used

    headers = {
        #'User-Agent': 'Mozilla/5.0',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0',
    }
    
    req = requests.get(url, timeout=10, allow_redirects=True, headers=headers)
    

    and it gives me some data.

    But I didn't check if it is useful HTML or some JavaScript or Captcha.


    If you already tested User-Agent and you still can't access in code then maybe server remember you (using other details) and it may need to use more complex system - e.g. proxy to change IP.

    OR it may need to use all information which you find in real web browser (in headers).

    OR it may need to use Selenium with real web browser.