I am trying to get the gold price and it's percentage increase / decrease (using web scraping). I am new to web scraping. The main issue is that the scraped html is dissimilar to the websites html (on google)
Hello
I am trying to make a python program to scrape the gold price off this website - https://goldprice.org/. I am completely new to the world of web-scraping, but have managed to come up with the following lines of python:
# this should print the website's html
import requests
from bs4 import BeautifulSoup
url = "https://goldprice.org/"
req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
data = req.content
soup = BeautifulSoup(data, "html.parser")
print(soup)
this works and outputs html after running.
The next step is to parse the data and find the following pieces of information:
data to find (the gold price and the percentage increase / decrease)
I have no idea how to do this, but I tried to implement a searching algorithm like so:
def SearchSoup(toFind):
"""Finds all the indexes of toFind in the soup"""
mentionIdx = []
current = ""
strLen = len(toFind)
start = 0
for i in range(len(soup)):
if i - start == strLen:
start = i
if soup[i] != toFind[i - start]:
start = i + 1
current = ""
continue
current += soup[i]
if toFind == current:
mentionIdx.append(start)
return mentionIdx
this does give me a traceback error when parsing the soup:
KeyError: 0" - stems from the "if soup[i] != toFind[i - start]:" and line 1573, in __getitem__ return self.attrs[key]
~~~~~~~~~~^^^^^
The main problem is that in the scraped html code, the gold price / gold %inc/dec does not seem to be included. I manually read through every line in it, but it wasn't there. Furthermore, when I went onto the actual website and checked the html on it (by pressing fn + f12), the scraped html differed from it.
I am quite lost here :)
To get the data of gold price/change use their Ajax API:
import requests
api_url = "https://data-asg.goldprice.org/dbXRates/USD"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0"
}
data = requests.get(api_url, headers=headers).json()
# print(data)
print(data["items"][0]["xauPrice"], data["items"][0]["pcXau"])
Prints:
2024.405 -4.4504
The full response looks like this (this includes gold and silver):
{
"ts": 1701714890300,
"tsj": 1701714885583,
"date": "Dec 4th 2023, 01:34:45 pm NY",
"items": [
{
"curr": "USD",
"xauPrice": 2024.0325,
"xagPrice": 24.514,
"chgXau": -94.6625,
"chgXag": -1.2275,
"pcXau": -4.468,
"pcXag": -4.7686,
"xauClose": 2118.695,
"xagClose": 25.7415,
}
],
}