Here is the code.
import pandas as pd
from pymed import PubMed
import numpy as np
pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")
## PUT YOUR SEARCH TERM HERE ##
search_term = 'Charlie Brown'
results = pubmed.query(search_term, max_results=100000)
articleList = []
articleInfo = []
for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
articleDict = article.toDict()
articleList.append(articleDict)
# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
pubmedId = article['pubmed_id'].partition('\n')[0]
# Append article info to dictionary
articleInfo.append({u'pubmed_id':pubmedId,
u'publication_date':article['publication_date'],
u'authors':article['authors']})
df=pd.json_normalize(articleInfo)
Running this code would fetch three columns, pubmed_id, publication_date and authors.
Is there a way to unnest the authors column and keep the other two columns? Thanks so much in advance.
If you want to unnest then, you have to define some strategy. For example, you can join the authors with lastname, firstname
splitting each author with ;
:
# New column to easily identify how many authors there are in the paper
df['n_authors'] = df['authors'].map(len)
# Unnest authors into a single string using the above-mentioned strategy
df['authors'] = df['authors'].map(lambda authors: ';'.join([f"{author['lastname']}, {author['firstname']}" for author in authors]))
Output:
pubmed_id publication_date authors n_authors
0 35435469 2022-04-19 Easwaran, Raju;Khan, Moin;Sancheti, Parag;Shya... 41
1 34480858 2021-09-05 Flaxman, Amy;Marchevsky, Natalie G;Jenkin, Dan... 38
2 30857579 2019-03-13 Brown, Charlie 1
3 28640023 2017-06-24 Thornton, Kevin C;Schwarz, Jennifer J;Gross, A... 12
4 24195874 2013-11-08 Bicket, Mark C;Gupta, Anita;Brown, Charlie H;C... 4
5 21741796 2011-07-12 Bird, Jonathan H;Carmont, Michael R;Dhillon, M... 7
6 21324873 2011-02-18 Cohen, Steven P;Brown, Charlie;Kurihara, Conni... 6
7 20228712 2010-03-17 Cohen, Steven P;Kapoor, Shruti G;Nguyen, Cuong... 8
8 20109957 2010-01-30 Cohen, Steven P;Brown, Charlie;Kurihara, Conni... 6
9 18248779 2008-02-06 Whitaker, Iain S;Duggan, Eileen M;Alloway, Rit... 10
10 16917639 2006-08-19 Drayton, William;Brown, Charlie;Hillhouse, Karin 3
11 16282488 2005-11-12 Mao, Hanwen;Lafont, Bernard A P;Igarashi, Tats... 9
12 14581571 2003-10-29 Moniuszko, Marcin;Brown, Charlie;Pal, Ranajit;... 7
13 12163382 2002-08-07 Williams, Kenneth;Schwartz, Annette;Corey, Sar... 10