pythonparsingweb-scraping

Getting the gold price from the web - python


I am trying to get the gold price and it's percentage increase / decrease (using web scraping). I am new to web scraping. The main issue is that the scraped html is dissimilar to the websites html (on google)

Hello

I am trying to make a python program to scrape the gold price off this website - https://goldprice.org/. I am completely new to the world of web-scraping, but have managed to come up with the following lines of python:

# this should print the website's html
import requests
from bs4 import BeautifulSoup

url = "https://goldprice.org/"

req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})

data = req.content

soup = BeautifulSoup(data, "html.parser")
print(soup)

this works and outputs html after running.

The next step is to parse the data and find the following pieces of information:

data to find (the gold price and the percentage increase / decrease)

I have no idea how to do this, but I tried to implement a searching algorithm like so:

def SearchSoup(toFind):
    """Finds all the indexes of toFind in the soup"""
    mentionIdx = []
    current = ""
    strLen = len(toFind)

    start = 0
    for i in range(len(soup)):
        if i - start == strLen:
            start = i

        if soup[i] != toFind[i - start]:
            start = i + 1
            current = ""
            continue
        
        current += soup[i]
        if toFind == current:
            mentionIdx.append(start)
    
    return mentionIdx

this does give me a traceback error when parsing the soup:

KeyError: 0" - stems from the "if soup[i] != toFind[i - start]:" and line 1573, in __getitem__ return self.attrs[key]
                                                                    ~~~~~~~~~~^^^^^

The main problem is that in the scraped html code, the gold price / gold %inc/dec does not seem to be included. I manually read through every line in it, but it wasn't there. Furthermore, when I went onto the actual website and checked the html on it (by pressing fn + f12), the scraped html differed from it.

I am quite lost here :)


Solution

  • To get the data of gold price/change use their Ajax API:

    import requests
    
    api_url = "https://data-asg.goldprice.org/dbXRates/USD"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0"
    }
    
    data = requests.get(api_url, headers=headers).json()
    
    # print(data)
    
    print(data["items"][0]["xauPrice"], data["items"][0]["pcXau"])
    

    Prints:

    2024.405 -4.4504
    

    The full response looks like this (this includes gold and silver):

    {
        "ts": 1701714890300,
        "tsj": 1701714885583,
        "date": "Dec 4th 2023, 01:34:45 pm NY",
        "items": [
            {
                "curr": "USD",
                "xauPrice": 2024.0325,
                "xagPrice": 24.514,
                "chgXau": -94.6625,
                "chgXag": -1.2275,
                "pcXau": -4.468,
                "pcXag": -4.7686,
                "xauClose": 2118.695,
                "xagClose": 25.7415,
            }
        ],
    }