pythongoogle-app-enginescreen-scrapingwikipediawikimedia-commons

How do I pull all image links from a Wikimedia page?


I am trying to pull all links from a Wikimedia page for famous painters such as Caravaggio with the Python Wikipedia module.

import wikipedia
page = wikipedia.page("caravaggio")
links = page.links

However the .links method only returns titles of links, not the actual href or src that I can use to display the image on my page.

Is it better to use import beautifulsoup for this?


Solution

  • Check this out:

    #!/usr/bin/python
    
    import wikipedia
    page = wikipedia.page("caravaggio")
    #links = page.links
    #for tuple_ in page:
    #    print tuple_
    #print dir(page)
    print page.content
    #print page.coordinates
    print 'page.html'
    print page.html
    print
    print 'page.images'
    print page.images
    print
    print 'page.links'
    print page.links
    print
    print 'page.original_title'
    print page.original_title
    print
    print 'page.pageid'
    print page.pageid
    print
    print 'page.parent_id'
    print page.parent_id
    print
    print 'page.references'
    print page.references
    print
    print 'page.revision_id'
    print page.revision_id
    print
    print 'page.section'
    print page.section
    print
    print 'page.sections'
    print page.sections
    print
    print 'page.summary'
    print page.summary
    print
    print 'page.title'
    print page.title
    print
    print 'page.url'
    print page.url
    print
    #print links