My question is about the source of truth regarding EDGAR.
I am developing a simple service for myself to choose stocks based on fundamental data. Therefore I went to the EDGAR site and downloaded all company facts. Testing I realize that there are discrepancies between the json and the xbrl viewer:
As an specific example, let's take the account CostOfGoodsAndServicesSold
. What I do is separate statements into annual and quarterly. So, reading annual data, the json from EDGAR (above) only provides this (quarterly is slightly better but also not complete):
{
"start": "2018-06-01",
"end": "2019-05-31",
"val": 1722300000,
"accn": "0001047469-19-004266",
"fy": 2019,
"fp": "FY",
"form": "10-K",
"filed": "2019-07-18",
"frame": "CY2018"
},
Ok, so I say, let's see how other site calculates it. And I see they take it from the xbrl link (above) which does have the data (900 + 902.8) which is not present in the json:
So basically, I'm blocked at the moment. I see the filings have more complete data than the json submissions and my questions are:
This is the code that would obtain the facts with the Brel engine (disclaimer: built by us at ETH as an academic project):
from brel import Filing
from brel.utils import open_edgar, pprint
#load the filing
filing = open_edgar(cik="1750", filing_type="10-K", date="2022-05-31")
#get the concept
CostOfGoodsAndServicesSold = filing.get_concept("us-gaap:CostOfGoodsAndServicesSold")
#get all the facts associated with this concept
facts = filing.get_facts_by_concept(CostOfGoodsAndServicesSold)
#tabular display to a human (note that the API also allows processing with code)
pprint(facts)
This is the output (obtained with a Jupyter Lab notebook with the above code):
You can see that this returns the six facts corresponding to your screenshot, and containing the correct dimensional characteristics.
Brel can be installed with
pip install brel-xbrl
Brel is a processor that respects Edgar Codd's data independence principle. We strive to make it independent from the underlying syntax used (whether XML, JSON, Inline XBRL, CSV...) to shield the user from physical details and stick to the XBRL data model.