I'm trying to use EDGAR API to retrieve 10-Q for any given company (corresponding to the CIK value provided.) This code retrieves the most recent 10-Q for Tesla. There are about 30 methods attached to this object, such as keys, values, items, and text_content. Text_content appears to be the only one that does not return an empty list []
. However, text is not easy to parse because the 10-Q varies considerably from one company to another.
Undoubtedly, someone will comment: Why did I set no_of_documents=2
? If this parameter is set to 1, the wrong document (not 10-Q) will be returned. With any parameter over 1, actual 10-Qs will be retrieved. I have no idea why the API behaves this way.
from edgar import Company
def func(cik):
company = Company("",cik)
tree = company.get_all_filings(filing_type="10-Q")
documents = Company.get_documents(tree,no_of_documents=2)
return documents[0]
test = func('0001318605')
What I'd like to do is (A) print out raw XML to take a peek at its underlying structure, then parse with either xmltodict
or xml.etree.ElementTree
. However, I'm receiving the following errors.
Using ET
import xml.etree.ElementTree as ET
ET.parse(test)
>>>
TypeError: expected str, bytes or os.PathLike object, not HtmlElement
Using XMLtoDict
import xmltodict
xmltodict.parse(test)
TypeError: a bytes-like object is required, not 'HtmlElement'
Again my goal is to search for navigate the XML content, however, without knowing what the tags are, I'm a bit stuck. How can I proceed?
You don't need to parse test
; you can use xpath methods directly on it. For example:
test.xpath('//entity/segment/explicitmember/text()')
Outputs:
'tsla:OperatingLeaseVehiclesMember',
'tsla:OperatingLeaseVehiclesMember',
'tsla:SolarEnergySystemsMember',
'tsla:SolarEnergySystemsMember',
'tsla:AutomotiveSegmentMember',
'tsla:AutomotiveSegmentMember',
etc. and
test.xpath('//context/period/instant/text()')
outputs:
['2020-07-20',
'2020-06-30',
'2019-12-31',
'2020-06-30',
'2019-12-31',
and so on.
Good luck; parsing xbrl filings is not an easy task...