pythonpandasapiasynchronousgeocoder

how to request a json file from an API using a list of lists instead of a list of values? how to speed up my API call?


geocoder.osm() Is an API function that is supposed to take two arguments: latitude and longitude, and then it returns the country name and all of its informations as a json file.

I have a big dataframe of 700k rows full of coordinates i wrote the following code to extract every coordinate's Country name:

import geocoder
import itertools

count=itertools.count(start=0)

def geo_rev(x):
    print('starting: ',next(count))
    g = geocoder.osm([x.latitude, x.longitude], method='reverse').json
    try:
        if g:
            return [g.get('country'),g.get('city')]
        else:
            return ['no country','no city']
    except ValueError:
        pass
        

data[['Country','City']]=data[['latitude','longitude']].apply(geo_rev,axis=1,result_type='expand')

as you see we are passing a list of two values for every row: [x.latitude, x.longitude].

the problem is: this code will take it forever to execute, that is why I want to pass a list of lists for the function geocoder.osm() to make the request even faster, my idea is to perform the following code:[list[latitude...],list[longitude...] ], how to do it?

TypeError: float() argument must be a string or a number, not 'list'

But if my idea (about passing a list of lists) is wrong, if there are another way to make an API call faster please tell me.


Solution

  • I have found an answer to my question, it looks very hard to do it using list of lists then i tried using Threading , Threading executes for APIs like asyncio at very high speed probably even ten times or twenty times faster it doesn't wait for every request to receive its file but it sends couple of requests at the same time, and then it receive thier files at the same time, the following code will worked right:

    import geocoder
    import itertools
    import concurrent.futures
    
    lst=list(zip(data.latitude.tolist(), data.longitude.tolist())) 
    countries=[]
    count=itertools.count(start=0)
    
    def geo_rev(x):
        print('starting: ',next(count))
        g = geocoder.osm([x[0], x[1]], method='reverse').json
        
        try:
            if g:
                return g.get('country')
            else:
                return 'no country'
        except ValueError:
            pass
            
    with concurrent.futures.ThreadPoolExecutor() as executor:
        results=executor.map(geo_rev, lst)
        for result in results:
            countries.append(result)
    
    data['Country']=[x for x in countries]
    

    Thanks for Corey Schafer for his Video it explains everything.