pythondownloadedgar

Download a txt file from EDGAR


I want to download this file to my local drive: https://www.sec.gov/Archives/edgar/data/1556179/0001104659-20-000861.txt

Here are my codes:

import requests
import urllib
from bs4 import BeautifulSoup
import re
  
path=r"https://www.sec.gov/Archives/edgar/data/1556179/0001104659-20-000861.txt" 
r=requests.get(path, headers={"User-Agent": "b2g"})
content=r.content.decode('utf8')
soup=BeautifulSoup(content, "html5lib")
soup=str(soup)
lines=soup.split("\\n")

dest_url=r"C://Users/YL/Downloads/a.txt"
fx=open(dest_url,'w')
for line in lines:
    fx.write(line + '\n')

Here is the error message: enter image description here

How should I download the file then? Thanks a lot!


Solution

  • The download is fine. The problem is that str(soup) is not well-defined, and throws html5lib into an endless loop. You probably meant

    soup = soup.text
    

    which (crudely) extracts the actual readable text from the BeatifulSoup object.