pythonhtmlweb-scraping

Is there a way to get information about elements from the inspect menu in a website?


I've tried to get the world population from this website: https://www.worldometers.info/world-population/ but I can only get the html code, not the data of the actual numbers.

I already tried to find children of the object I tried to get data from. I also tried to list the whole object, but nothing seemed to work.

'''just importing stuff '''

import urllib.request

import requests

from bs4 import BeautifulSoup

'''getting html from website to text '''

r = requests.get('https://www.worldometers.info/world-population/')

soup = BeautifulSoup(r.text,'html.parser')

'''here it only finds the one object that's is listed below '''

current_population = soup.find('div',{'class':'maincounter-number'}).find_all('span', recursive=False)

print(current_population)

This is the object the information is stored in:

(span class="rts-counter" rel="current_population">retrieving data... </span>

and in 'inspect-mode' you can see this:

(span class="rts-counter" rel="current_population">(span class="rts-nr-sign"></span>(span class="rts-nr-int rts-nr-10e9">7</span>(span class="rts-nr-thsep">,</span>(span class="rts-nr-int rts-nr-10e6">703</span>(span class="rts-nr-thsep">,</span>(span class="rts-nr-int rts-nr-10e3">227</span><span class="rts-nr-thsep">,</span>(span class="rts-nr-int rts-nr-10e0">630</span></span>

I always only get the first one, but want to get the second one from 'inspect-mode'.

Here is a picture of the inspect-mode.


Solution

  • You are going to need a method that lets javascript run such as selenium as this number is set up via a counter that is generated in this script: https://www.realtimestatistics.net/rts/RTSp.js

    from selenium import webdriver
    
    d = webdriver.Chrome()
    d.get('https://www.worldometers.info/world-population/')
    print(d.find_element_by_css_selector('[rel="current_population"]').text)
    

    You could try writing your own version of that javascript script but I wouldn't recommend it.

    I didn't need an explicit wait condition for selenium script but that could be added.