I'm looking to transform XBRL report instances, specifically financial reports such as those produced by the SEC, into Python dictionaries or JSON. I've spent time developing code using bs4 (beautiful soup), but ideally I'd like to leverage the open source Arelle library.
My understanding is there is a plug-in for the Arelle software package called "saveLoadableOIM". There is general guidance published by XBRL.org; however, it stops short of practical implementation.
http://www.xbrl.org/Specification/xbrl-json/CR-2020-05-06/xbrl-json-CR-2020-05-06.html
I have found the documentation for command prompt usage of Arelle to be out-of-date & inapplicable to Python 3.x. Could anyone provide guidance on how to operate Arelle through the python command prompt; and, specifically, how to convert a SEC xBRL report instance into JSON? I'd like a model that's adaptive to future changes in the standard taxonomies, particularly GAAP:
https://www.sec.gov/info/edgar/edgartaxonomies.shtml
It would be particularly helpful to have sample code for mapping the following XBRL report instance of a MSFT 10-K into JSON:
https://www.sec.gov/Archives/edgar/data/789019/000156459018019062/msft-20180630.xml
If there are limitations in the existing Arelle library, I'd like to understand what these are.
I invoke Arelle under Python 3 with:
python3 $HOME/Arelle/arelleCmdLine.py
This is on Linux, and assumes I have Arelle checked out in my home directory as Arelle
.
To load a plugin, use --plugins
and give it the name of a file under the Arelle/arelle/plugin
directory (without the .py
extension). For example, --plugins=saveLoadableOIM
. You can then add --help
and you should see additional options included in the help message.
This works for me:
python3 $HOME/Arelle/arelleCmdLine.py --plugins=saveLoadableOIM --saveLoadableOIM=msft.json -f https://www.sec.gov/Archives/edgar/data/789019/000156459018019062/msft-20180630.xml
Example of extracting data using the awesome jq:
jq '[.facts[] | select( .dimensions.concept | test(":GrossProfit$") )] | sort_by(.dimensions.period)[-1]' msft.json
This gets the most recent GrossProfit value:
{
"value": "20343000000",
"decimals": -6,
"dimensions": {
"concept": "us-gaap:GrossProfit",
"entity": "cik:0000789019",
"period": "2018-04-01T00:00:00/2018-07-01T00:00:00",
"unit": "iso4217:USD"
}
}
I should note that the xBRL-JSON specification is not yet finalised, and it's likely that the format of this JSON may change slightly over time. I'd expect Arelle to be updated to the final version once it's available.