[SOLVED] Find Location of All Numbers with a Comma

Find Location of All Numbers with a Comma

I have a been scraping some HTML pages with beautiful soup trying to extract some updated financial data. I only care about numbers that have a comma ie 100,000 or 12,000,000 but not 450 for example. The goal is just to find the location of the comma separated numbers within a string then I need to extract the entire sentence they are in.

I moved the entire scrape to a string list and within that list I want to extract all numbers that have a comma.

url = 'https://www.sec.gov/Archives/edgar/data/354950/000035495020000024/hd-2020proxystatement.htm'
r = requests.get(url)  
soup = BeautifulSoup(r.content)
text = soup.find_all(text = True)
strings = []
for i in range(len(text)):
        text_s = str(proxy_text[i])
        strings.append(text)

I thought about the follow re code but I am not sure if it will extract all instances.. ie within the list there may be multiple instances of numbers separated by commas.

number  = re.sub('[^>0-9,]', "", text)

Any thoughts would be a huge help! Thank you

Solution

You can use:

from bs4 import BeautifulSoup
import requests, re

url = 'https://www.sec.gov/Archives/edgar/data/354950/000035495020000024/hd-2020proxystatement.htm'
soup = BeautifulSoup(requests.get(url).text, "html5lib")
for el in soup.find_all(True): # loop all element in page
    if re.search(r"(?=\d+,\d+).*", el.text):
        print(el.text)
        # print("END OF ELEMENT\n") # debug only