I am new to python. I am trying to parse some 10-Ks from Edgar using edgartools
and sec-parsers
module of python. Here is my code -
import pandas as pd
# pip install edgartools
from edgar import *
# Tell the SEC who you are
set_identity("Your Name myemail@outlook.com")
filings = get_filings( form = "10-K", filing_date="2023-12-15:2024-07-16",amendments=False )
filings_df = filings.to_pandas() # all filings info now in a data frame
filings[5].document.url # to get the url of a individual document
But when I run the following code -
for x in filings:
filings[x].document.url
The error shows - 'Filing
object cannot be interpreted as an integer.
I am not sure why this is happening. I want the result of for loop function above in a list so that I can later use it in sec-parsers
like this -
from sec_parsers import Filing, download_sec_filing, set_headers
def print_first_n_lines(text, n):
lines = text.split('\n')
for line in lines[:n]:
print(line)
html = download_sec_filing(filings[6070].document.url) # for example I need the url for filings index 6070
filing = Filing(html)
filing.html
filing.parse() # parses filing
filing.xml
item1c = filing.find_nodes_by_title('item 1c') [0]
item1c_text = filing.get_node_text(item1c)
print_first_n_lines(item1c_text,50)
My goal is to create a data frame for all filings with the text from Item C in 10-K and add this Item C text in filings_df
data frame as an additional column. Note that to add the Item C text in filings_df
data frame as a column, we can use CIK and the year variable (from filing_date variable).
Thanks
The issue arises because you're trying to use an integer as an index for your filings list in the for loop, which isn't necessary since filings is already an iterable object. Additionally, the filings object appears to be a list of Filing objects, and you should iterate over the objects directly instead of indexing them.
import pandas as pd
# pip install edgartools
from edgar import *
from sec_parsers import Filing, download_sec_filing, set_headers
# Tell the SEC who you are
set_identity("Your Name myemail@outlook.com")
# Fetch filings
filings = get_filings(form="10-K", filing_date="2023-12-15:2024-07-16", amendments=False)
# Convert filings to a DataFrame
filings_df = filings.to_pandas()
# Create a list to store the Item 1c text
item1c_texts = []
# Iterate over each filing
for filing in filings:
url = filing.document.url
cik = filing.cik
filing_date = filing.filing_date
# Download and parse the filing
html = download_sec_filing(url)
sec_filing = Filing(html)
sec_filing.parse()
# Extract the text for Item 1c
item1c_nodes = sec_filing.find_nodes_by_title('item 1c')
if item1c_nodes:
item1c_text = sec_filing.get_node_text(item1c_nodes[0])
else:
item1c_text = None
item1c_texts.append({
'CIK': cik,
'Filing Date': filing_date,
'Item 1c Text': item1c_text
})
# Create a DataFrame from the Item 1c text data
item1c_df = pd.DataFrame(item1c_texts)
# Merge the Item 1c text DataFrame with the filings DataFrame
filings_df = filings_df.merge(item1c_df, on=['CIK', 'Filing Date'])
# Print the final DataFrame
print(filings_df)
In this code:
This should help you achieve your goal of creating a DataFrame with the text from "Item 1c" in the 10-K filings.