pythonbioinformaticsbiopythongenbank

How to get the scientific name given the GenBank accession code to biopython?


Someone knows how I can get the scientific name (or all the features) from a data in the GenBank using only the GenBank code accession and biopython. For example:

>>> From Bio import Entrez
>>> Entrez.email = someuser@mail.com
>>> Input = Entrez.someFunction(db="nucleotide", term="AY851612")
>>> output = Entrez.read(Input)
>>> print output

"Austrocylindropuntia subulata"

Or well:

>>> print output

"LOCUS AY851612 892 bp DNA linear PLN 10-APR-2007
DEFINITION Opuntia subulata rpl16 gene, intron; chloroplast.
ACCESSION AY851612
VERSION AY851612.1 GI:57240072
KEYWORDS .
SOURCE chloroplast Austrocylindropuntia subulata
ORGANISM Austrocylindropuntia subulata
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
Caryophyllales; Cactaceae; Opuntioideae; Austrocylindropuntia.
REFERENCE 1 (bases 1 to 892)
AUTHORS Butterworth,C.A. and Wallace,R.S.
..."

Thanks to all ! =)


Solution

  • Note that output is a dictionary. You can access any appropriate fields if needed. Also, you would want to use efetch, as opposed to esearch.

    In [1]: from Bio import Entrez
    
    In [3]: Entrez.email = '##############'
    
    In [28]: handle = Entrez.efetch(db="nucleotide", id="AY851612", rettype="gb", retmode="text")
    
    In [29]: x = SeqIO.read(handle, 'genbank')
    
    In [30]: print(x)
    ID: AY851612.1
    Name: AY851612
    Description: Opuntia subulata rpl16 gene, intron; chloroplast.
    Number of features: 3
    /date=10-APR-2007
    /sequence_version=1
    /taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'Gunneridae', 'Pentapetalae', 'Caryophyllales', 'Cactineae', 'Cactaceae', 'Opuntioideae', 'Austrocylindropuntia']
    /data_file_division=PLN
    /references=[Reference(title='Molecular Phylogenetics of the Leafy Cactus Genus Pereskia (Cactaceae)', ...), Reference(title='Direct Submission', ...)]
    /keywords=['']
    /accessions=['AY851612']
    /gi=57240072
    /organism=Austrocylindropuntia subulata
    /source=chloroplast Austrocylindropuntia subulata
    Seq('CATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGA...AGA', IUPACAmbiguousDNA())
    
    In [31]: x.description
    Out[31]: 'Opuntia subulata rpl16 gene, intron; chloroplast.'