pythonpython-requests-html

Struggling to extract JSON from a web page


I am trying to scrape the window.PRELOAEDED_STATE from the following url using requests.json, I cant isolate the element I want so that i can use the json function on it.

I tried the below code first.

response = requests.get(https://www.racingpost.com/profile/horse/431262/ready-for-action-ii)

I successfully got a response from the server and when viewing the text that the request produces I can see the data I would like in the HTML but I cant single it down to the window.PRELOADED_STATE element that I want. Once I have that element I want to use .json() on it in order to get the data into a dictionary


Solution

  • Use a regular expression to extract everything on the line between window.PRELOADED_STATE = and the final ;.

    import re, requests, json
    
    response = requests.get('https://www.racingpost.com/profile/horse/431262/ready-for-action-ii')
    state_match = re.search(r'window.PRELOADED_STATE\s*=\s(.*);', response.text)
    if state_match:
        preloaded_state = json.loads(state_match.group(1))