pythonweb-scrapingbeautifulsoupstocks

BeautifulSoup Scraping U.S. News Today Stock <table>


Using Python, I am trying to scrap a table of stocks under $10 from U.S. Today Money Stocks Under $10. And then add each element to a list (so that I can iterate through each stock). Currently, I have this code:

resp = requests.get('https://money.usnews.com/investing/stocks/stocks-under-10')
soup = bs.BeautifulSoup(resp.text, "lxml")
table = soup.find('table', {'class': 'table stock full-row search-content'})
tickers = []
for row in table.findAll('tr')[1:]:
    ticker = str(row.findAll('td')[0].text)
    tickers.append(ticker)

I keep getting the error:

Traceback (most recent call last):
  File "sandp.py", line 98, in <module>
    sandp(0)
  File "sandp.py", line 40, in sandp
    for row in table.findAll('tr')[1:]:
AttributeError: 'NoneType' object has no attribute 'findAll'

Solution

  • The site is dynamic, thus, you can use selenium:

    from selenium import webdriver
    import collections
    from bs4 import BeautifulSoup as soup
    import re
    d = webdriver.Chrome('/path/to/chromedriver')
    d.get('https://money.usnews.com/investing/stocks/stocks-under-10')
    s = soup(d.page_source, 'lxml')
    while True:
      try:
        d.find_element_by_link_text("Load More").click() #get all data
      except:
        break
    company = collections.namedtuple('company', ['name', 'abbreviation', 'description', 'stats'])
    headers = [['a', {'class':'search-result-link'}], ['a', {'class':'text-muted'}], ['p', {'class':'text-small show-for-medium-up ellipsis'}], ['dl', {'class':'inline-dl'}], ['span', {'class':'stock-trend'}], ['div', {'class':'flex-row'}]]
    final_data = [[getattr(i.find(a, b), 'text', None) for a, b in headers] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'search-result flex-row'})]
    new_data = [[i[0], i[1], re.sub('\n+\s{2,}', '', i[2]), [re.findall('[\$\w\.%/]+', d) for d in i[3:]]] for i in final_data]
    final_results = [i[:3]+[dict(zip(['Price', 'Daily Change', 'Percent Change'], filter(lambda x:re.findall('\d', x), i[-1][0])))] for i in new_data]
    new_results = [company(*i) for i in final_results]
    

    Output (first company):

    company(name=u'Aileron Therapeutics Inc', abbreviation=u'ALRN', description=u'Aileron Therapeutics, Inc. is a clinical stage biopharmaceutical company, which focuses on developing and commercializing stapled peptides. Its ALRN-6924 product targets the tumor suppressor p53 for the treatment of a wide variety of cancers. It also offers the MDMX and MDM2. The company was founded by Gregory L. Verdine, Rosana Kapeller, Huw M. Nash, Joseph A. Yanchik III, and Loren David Walensky in June 2005 and is headquartered in Cambridge, MA.more\n', stats={'Daily Change': u'$0.02', 'Price': u'$6.04', 'Percent Change': u'0.33%'})
    

    Edit:

    All abbreviations:

    abbrevs = [i.abbreviation for i in new_results]
    

    Output:

    [u'ALRN', u'HAIR', u'ONCY', u'EAST', u'CERC', u'ENPH', u'CASI', u'AMBO', u'CWBR', u'TRXC', u'NIHD', u'LGCY', u'MRNS', u'RFIL', u'AUTO', u'NEPT', u'ARQL', u'ITUS', u'SRAX', u'APTO']