What is the best way in python to parse these results? I have tried regex but can't get it to work. I am looking for a dictionary of title, author etc as keys.
@article{perry2000epidemiological,
title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study},
author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
journal={Journal of public health},
volume={22},
number={3},
pages={427--434},
year={2000},
publisher={Oxford University Press}
}
You might be looking for a BibTeX-parser: https://bibtexparser.readthedocs.io/en/master/
Source: https://bibtexparser.readthedocs.io/en/master/tutorial.html#step-0-vocabulary
Input/Create bibtex file:
bibtex = """@ARTICLE{Cesar2013, author = {Jean César}, title = {An amazing title}, year = {2013}, month = jan, volume = {12}, pages = {12--23}, journal = {Nice Journal}, abstract = {This is an abstract. This line should be long enough to test multilines...}, comments = {A comment}, keywords = {keyword1, keyword2} } """ with open('bibtex.bib', 'w') as bibfile: bibfile.write(bibtex)
Parse it:
import bibtexparser with open('bibtex.bib') as bibtex_file: bib_database = bibtexparser.load(bibtex_file) print(bib_database.entries)
Output:
[{'journal': 'Nice Journal', 'comments': 'A comment', 'pages': '12--23', 'month': 'jan', 'abstract': 'This is an abstract. This line should be long enough to test\nmultilines...', 'title': 'An amazing title', 'year': '2013', 'volume': '12', 'ID': 'Cesar2013', 'author': 'Jean César', 'keyword': 'keyword1, keyword2', 'ENTRYTYPE': 'article'}]