pythonurlscraperwiki

Scraping links from more than one URL


I'm using ScraperWiki to pull in links from the london-gazette.co.uk site. How would I edit the code so that I can paste in a number of separate search URLs at the bottom which are all collated into the same datastore?

At the moment I can just paste in the new URL, hit run, and the new data is added on to the back of the old data, but I was wondering if there's a way to speed things up and get the scraper to work on several URLs at once? I would be changing the 'notice code' part of the URLs: issues/2013-01-15;2013-01-15/all=NoticeCode%3a2441/start=1

Sorry - new to Stack Overflow and my coding knowledge is pretty much non existent, but the code is here: https://scraperwiki.com/scrapers/links_1/edit/


Solution

  • The scraper you linked to seems to be empty, but I had a look at the original scraper by Rebecca Ratcliffe. If yours is the same, you only have to put your URLs into a list and loop through them with a for-loop:

    urls = ['/issues/2013-01-15;2013-01-15/all=NoticeCode%3a2441/start=1', 
    '/issues /2013-01-15;2013-01-15/all=NoticeCode%3a2453/start=1',
    '/issues/2013-01-15;2013-01-15/all=NoticeCode%3a2462/start=1', 
    '/issues/2012-02-10;2013-02-20/all=NoticeCode%3a2441/start=1']
    
    base_url = 'http://www.london-gazette.co.uk'
    for u in urls:
        starting_url = urlparse.urljoin(base_url, u)
        scrape_and_look_for_next_link(starting_url)
    

    Just have a look at this scraper that I copied and adapted accordingly.