pythonweb-scrapingflickr

Why is FlickrAPI hanging after giving the results?


I am trying to scrape images from Flickr using the FlickrAPI. What is happening is that the command line just stays there and nothing happens after the image URLs have been scraped. It's something like the following: enter image description here

Nothing happens after this screen, it stays here for a long time, somewhere in the range of 1200 seconds or more sometimes.

For scraping I used the following code:

def get_urls(search='honeybees on flowers', n=10, download=False):
    t = time.time()
    flickr = FlickrAPI(key, secret)
    license = ()  # https://www.flickr.com/services/api/explore/?method=flickr.photos.licenses.getInfo
    photos = flickr.walk(text=search,  # http://www.flickr.com/services/api/flickr.photos.search.html
                         extras='url_o',
                         per_page=500,  # 1-500
                         license=license,
                         sort='relevance')

    if download:
        dir = os.getcwd() + os.sep + 'images' + os.sep + search.replace(' ', '_') + os.sep  # save directory
        if not os.path.exists(dir):
            os.makedirs(dir)

    urls = []
    for i, photo in enumerate(photos):
        if i < n:
            try:
                # construct url https://www.flickr.com/services/api/misc.urls.html
                url = photo.get('url_o')  # original size
                if url is None:
                    url = 'https://farm%s.staticflickr.com/%s/%s_%s_b.jpg' % \
                          (photo.get('farm'), photo.get('server'), photo.get('id'), photo.get('secret'))  # large size

                download
                if download:
                    download_uri(url, dir)

                urls.append(url)
                print('%g/%g %s' % (i, n, url))
            except:
                print('%g/%g error...' % (i, n))

    # import pandas as pd
    # urls = pd.Series(urls)
    # urls.to_csv(search + "_urls.csv")
    print('Done. (%.1fs)' % (time.time() - t) + ('\nAll images saved to %s' % dir if download else ''))

This function is called as follows:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--search', type=str, default='honeybees on flowers', help='flickr search term')
    parser.add_argument('--n', type=int, default=10, help='number of images')
    parser.add_argument('--download', action='store_true', help='download images')
    opt = parser.parse_args()

    get_urls(search=opt.search,  # search term
             n=opt.n,  # max number of images
             download=opt.download)  # download images

I tried going through the function code multiple times but I can't seem to understand why nothing happens after the scraping is done, as everything else is working fine.


Solution

  • I can't run it but I think all problem is that it gets information about 500 photos - because you have per_page=500 - and it runs for-loop for all 500 photos and you have to wait for the end of for-loop.

    You should use break to exit this loop after n images

        for i, photo in enumerate(photos):
            if i >= n:
               break
            else: 
                try:
                   # ...code ...
    

    Or simply you should use photos[:n] and then you don't have to check i < n

        for i, photo in enumerate(photos[:n]):
            try:
               # ...code ...
    

    Eventually you should use per_page=n


    BTW:

    You can use os.path.join to create path

    dir = os.path.join(os.getcwd(), 'images', search.replace(' ', '_'))
    

    If you use exist_ok=True in makedirs() then you don't have to check if not os.path.exists(dir):

    if download:
       dir = os.path.join(os.getcwd(), 'images', search.replace(' ', '_'))
       os.makedirs(dir, exist_ok=True)
    

    If you use enumerate(photos, 1) then you get values 1,2,3,... instead of 0,1,2,...