pythonpandassolrpysolr

How do I make PySolr drop a connection?


I'm working on time series charts for 300+ clients. It is beneficial to us to pull each client separately as the combined data is huge and in some cases clients data is resampled or manipulated in a slightly different fashion.

My problem is that the function I loop through to get each client data opens 3 new threads but never closes the threads (I'm assuming the connection stays open) when the request is complete and the function returns the data.

Once I have the results of a client, I'd like to close that connection. I just can't figure out how to do that and haven't been able to find anything in my searches.

def solr_data_pull(submitterId): 
    zookeeper= pysolr.ZooKeeper('ndhhadr1dnp11,ndhhadr1dnp12,ndhhadr1dnp13:2181/solr')
    solr = pysolr.SolrCloud(zookeeper, collection='tran_timings', timeout=60)

    query = ('SubmitterId:'+ str(submitterId) +' AND Tier:'+tier+' AND Mode:'+mode+' '
             'AND Timestamp:['+ str(start_period)+' TO '+ str(end_period)+ '] ')

    results = solr.search(rows=50000, q=[query], fl=[fl_list])

    return(pd.DataFrame(list(results)))

Solution

  • PySolr uses the Session object from requests as its underlying library (which in turn uses urllib3s connection pooling), so calling solr.get_session().close() should close all connections and drain the pool:

    def close(self):
        """Closes all adapters and as such the session"""
    

    (SolrCloud is an extension of Solr which have the get_session() method.)

    For disconnecting from Zookeeper - which you probably shouldn't if its a long running session as it'll have to set up watches etc. again, you can use the .zk object directly on your SolrCloud instance - zk is a KazooClient:

    stop()
    Gracefully stop this Zookeeper session.
    
    close()
    Free any resources held by the client.
    
    This method should be called on a stopped client before 
    it is discarded. Not doing so may result in filehandles 
    being leaked.