pythonweb-scrapingstocktwits

'NoneType' Error While WebScraping StockTwits


I am trying to write a script that simply reads and prints all of the tickers on a particular accounts watchlist. I have managed to navigate to the page print the user's name from the HTML, and now I want to print all the tickers he follows by using find() to find their location, then .find_all() to find each ticker, but every time I try to use the find() command to navigate to the watchlist tickers it returns 'NoneType.'

Here is my code:

import requests
import xlwt
from xlutils.copy import copy
from xlwt import Workbook
import xlrd
import urllib.request as urllib2
from bs4 import BeautifulSoup

hisPage = ("https://stocktwits.com/GregRieben/watchlist")

page = urllib2.urlopen(hisPage)

soup = BeautifulSoup(page, "html.parser")

his_name = soup.find("span", {"class":"st_33aunZ3 st_31YdEUQ st_8u0ePN3 st_2mehCkH"})

name = his_name.text.strip()
print(name)

watchlist = soup.find("div", {"class":"st_16989tz"})

tickers = watchlist.find_all('span', {"class":"st_1QzH2P8"})

print(type(watchlist))
print(len(watchlist))

Here I want the highlighted value (LSPD.CA) and all the others afterwards (they all have the exact same HTML set up)

highlighted value

Here is my Error:

Error


Solution

  • That content is dynamically added from an api call (so not present in your request to original url where DOM is not updated as it would be when using a browser). You can find the API call for the watchlist in the network traffic. It returns json. You can extract what you want from that.

    import requests
    
    r = requests.get('https://api.stocktwits.com/api/2/watchlists/user/396907.json').json()
    tickers = [i['symbol'] for i in r['watchlist']['symbols']]
    print(tickers)
    

    If you need to get user id to pass to API it is present in a number of places in response from your original url. I am using regex to grab from a script tag

    import requests, re
    
    p = re.compile(r'subjectUser":{"id":(\d+)')
    
    with requests.Session() as s:
        r = s.get('https://stocktwits.com/GregRieben/watchlist')
        user_id = p.findall(r.text)[0]
        r = s.get('https://api.stocktwits.com/api/2/watchlists/user/' + user_id + '.json').json()
        tickers = [i['symbol'] for i in r['watchlist']['symbols']]
    print(tickers)