Iteration an object in Python

I am new to python. I am trying to parse some 10-Ks from Edgar using edgartools and sec-parsers module of python. Here is my code -

import pandas as pd
# pip install edgartools
from edgar import *

# Tell the SEC who you are
set_identity("Your Name myemail@outlook.com")

filings = get_filings( form = "10-K", filing_date="2023-12-15:2024-07-16",amendments=False )

filings_df = filings.to_pandas() # all filings info now in a data frame 

filings[5].document.url # to get the url of a individual document

But when I run the following code -

for x in filings:
    filings[x].document.url

The error shows - 'Filing object cannot be interpreted as an integer.

I am not sure why this is happening. I want the result of for loop function above in a list so that I can later use it in sec-parsers like this -

from sec_parsers import Filing, download_sec_filing, set_headers

def print_first_n_lines(text, n):
    lines = text.split('\n')
    for line in lines[:n]:
        print(line)

html = download_sec_filing(filings[6070].document.url) # for example I need the url for filings index 6070

filing = Filing(html)
filing.html

filing.parse() # parses filing
filing.xml


item1c = filing.find_nodes_by_title('item 1c') [0]
item1c_text = filing.get_node_text(item1c)
print_first_n_lines(item1c_text,50)

My goal is to create a data frame for all filings with the text from Item C in 10-K and add this Item C text in filings_df data frame as an additional column. Note that to add the Item C text in filings_df data frame as a column, we can use CIK and the year variable (from filing_date variable).

Thanks

Solution

The issue arises because you're trying to use an integer as an index for your filings list in the for loop, which isn't necessary since filings is already an iterable object. Additionally, the filings object appears to be a list of Filing objects, and you should iterate over the objects directly instead of indexing them.

import pandas as pd
# pip install edgartools
from edgar import *
from sec_parsers import Filing, download_sec_filing, set_headers

# Tell the SEC who you are
set_identity("Your Name myemail@outlook.com")

# Fetch filings
filings = get_filings(form="10-K", filing_date="2023-12-15:2024-07-16", amendments=False)

# Convert filings to a DataFrame
filings_df = filings.to_pandas()

# Create a list to store the Item 1c text
item1c_texts = []

# Iterate over each filing
for filing in filings:
    url = filing.document.url
    cik = filing.cik
    filing_date = filing.filing_date
    
    # Download and parse the filing
    html = download_sec_filing(url)
    sec_filing = Filing(html)
    sec_filing.parse()
    
    # Extract the text for Item 1c
    item1c_nodes = sec_filing.find_nodes_by_title('item 1c')
    if item1c_nodes:
        item1c_text = sec_filing.get_node_text(item1c_nodes[0])
    else:
        item1c_text = None
    
    item1c_texts.append({
        'CIK': cik,
        'Filing Date': filing_date,
        'Item 1c Text': item1c_text
    })

# Create a DataFrame from the Item 1c text data
item1c_df = pd.DataFrame(item1c_texts)

# Merge the Item 1c text DataFrame with the filings DataFrame
filings_df = filings_df.merge(item1c_df, on=['CIK', 'Filing Date'])

# Print the final DataFrame
print(filings_df)

In this code:

You create a list of URLs for all filings using a list comprehension.
You use this list of URLs to download and parse the filings.
You extract the text for "Item 1c" from each filing and store it in a list. 4 . You create a DataFrame from the list of "Item 1c" texts.

This should help you achieve your goal of creating a DataFrame with the text from "Item 1c" in the 10-K filings.