pythonweb-scrapingnetwork-programming

Can't scrape Discover bank page


I've looked around and tried a bunch of different things, but cant seem to find any info on this topic.

I'm trying to scrape info from my bank (Discover) and wrote a script to do so. It returns everything fine, but is returning a "logged out" page instead of the desired homepage with my balance.

My messy code is as follows:

import requests
from bs4 import BeautifulSoup as bs
def scrapeDiscover():
    URL = 'https://portal.discover.com/customersvcs/universalLogin/signin'
    request_URL = 'https://portal.discover.com/customersvcs/universalLogin/signin'

    HEADERS = {'User-Agent':'User-Agent: Mozilla/5.0 (Windows NT; Windows NT 6.2; en-US) WindowsPowerShell/4.0', 'Origin':'https://portal.discover.com', 'Referer':'https://portal.discover.com/customersvcs/universalLogin/ac_main'}
    s = requests.session()
    PAYLOAD = {
        'userID' : 'username',
        'password' : 'password',
        'choose-card' : 'Credit Card',
        'pm_fp' : 'version=-1&pm_fpua=mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/95.0.4638.69 safari/537.36|5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36|Linux x86_64&pm_fpsc=24|1920|1080|1053&pm_fpsw=&pm_fptz=-6&pm_fpln=lang=en-US|syslang=|userlang=&pm_fpjv=0&pm_fpco=1',
        'currentFormId' : 'login',
        'userTypeCode' : 'C',
        'rememberOption' : 'on',
    }

    login_req = s.post(URL, headers=HEADERS, data=PAYLOAD)
    cookies = login_req.cookies
    soup = bs(s.get('https://card.discover.com/cardmembersvcs/achome/homepage').text, 'html.parser')

    balance = soup.text
    print(balance)

scrapeDiscover()

I also looked at the post request info needed, and have it here: enter image description here

Any help or suggestions would be super appreciated! Even just a suggestion would help a ton. Thanks so much all! Let me know if more information is needed.

EDIT: Added information

I imagine there's probably some missing cookie or token in the post request, but I've poured over the code many times and find anything that works when implemented, or, even if I'm implementing it correctly.

A couple things that stand out to me:

SSID: In the 'Form Data' of the post request that works, there's an 'ssid' form with a long string. However, this changes every time and I imagined that it stood for 'session ID' and that I didn't need it since my code was creating a new session.

ssid: 0433c923-6f48-4832-8d6d-b26c5b0e6d4-1637097180562

STRONGAUTHSVS: Another thing I found that stood out was this "STRONGAUTHSVS" variable (nested within the long string of cookies, both in the request and recieved headers)

STRONGAUTHSVCS=SASID=null&SATID=b081-

sectoken: Lastly, I saw the work token and I thought this could be it. A variable in the cookies with 'sectoken' as the variable name. No idea what it is though, or how I would impliment it.

sectoken=hJNQgh7EOnH1xx1skqQqftbV/kE=

With all these, I've tried my best at implementing them into the headers in my code, but it seemed to have no effect on the output. I've attached a pastebin of the site cookies and form data captured (minus any sensitive data). If anyone has any ideas, I'd be super thankful!

https://pastebin.com/PNnV6Mpw


Solution

  • read this. I think, you probably need a token for your POST request, for the security reasons. If just the scraping is important, try to use selenium.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Firefox()
    driver.get("https://portal.discover.com/customersvcs/universalLogin/signin")
    driver.maximize_window()
    
    log_in = (
        WebDriverWait(driver, 5)
        .until(
            EC.element_to_be_clickable(
                (By.XPATH, "/html/body/div[1]/header/div[1]/div[2]/div[2]/ul/li[3]/a")
            )
        )
        .click()
    )
    driver.find_element_by_xpath("//*[@id='userid']").send_keys("your_user_id")
    driver.find_element_by_xpath("//*[@id='password']").send_keys("your_password")
    driver.find_element_by_xpath("//*[@id='log-in-button']").click()
    

    I got an error when I use left panel for log in.