solrapache-zookeepersolrcloudpysolr

Connection to solr cloud collection using pysolr


I have configured a multicore solr cloud. Created a collection with 2 shrads and no replication. Cloud in the UI of solr

Through the solr UI 192.168.1.56:8983, I am able to get results to the query.

I want to do the same with pysolr, so tried running following:

import pysolr
zookeeper = pysolr.ZooKeeper("192.168.1.56:2181,192.168.1.55:2182")
solr = pysolr.SolrCloud(zookeeper, "random_collection")

the last line is not able to find the collection even though its there. Below is a error trace:

---------------------------------------------------------------------------
SolrError                                 Traceback (most recent call last)
<ipython-input-43-9f03eca3b645> in <module>()
----> 1 solr = pysolr.SolrCloud(zookeeper, "patent_colllection")

/usr/local/lib/python2.7/dist-packages/pysolr.pyc in __init__(self, zookeeper, collection, decoder, timeout, retry_timeout, *args, **kwargs)
   1176 
   1177     def __init__(self, zookeeper, collection, decoder=None, timeout=60, retry_timeout=0.2, *args, **kwargs):
-> 1178         url = zookeeper.getRandomURL(collection)
   1179 
   1180         super(SolrCloud, self).__init__(url, decoder=decoder, timeout=timeout, *args, **kwargs)

/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getRandomURL(self, collname, only_leader)
   1315 
   1316     def getRandomURL(self, collname, only_leader=False):
-> 1317         hosts = self.getHosts(collname, only_leader=only_leader)
   1318         if not hosts:
   1319             raise SolrError('ZooKeeper returned no active shards!')

/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getHosts(self, collname, only_leader, seen_aliases)
   1281         hosts = []
   1282         if collname not in self.collections:
-> 1283             raise SolrError("Unknown collection: %s", collname)
   1284         collection = self.collections[collname]
   1285         shards = collection[ZooKeeper.SHARDS]

SolrError: (u'Unknown collection: %s', 'random_collection')

Solr version is 6.6.2 and zookeeper version is 3.4.10

How to create a connection to solr cloud collection?


Solution

  • Pysolr currently does not support external zookeeper cluster. Pysolr checks for collections in clusterstate.json which Solr has improvised to state.json for each cluster, and clusterstate.json is kept empty.

    To solve your problem for single collection you can hard-code ZooKeeper.CLUSTER_STATE variable in pysolr.py as follows:

    ZooKeeper.CLUSTER_STATE = '/collections/random_collection/state.json'
    

    pysolr.py could be found at /usr/local/lib/python2.7/dist-packages and maybe try reinstalling it with

    pip install -e /usr/local/lib/python2.7/dist-packages/pysolr.py