solrpysolr

how to load a JSON file to solr using `pysolr`?


The following python code adds a document but without the json contents:

solr_instance = pysolr.Solr('http://192.168.45.153:8983/solr/test', timeout=60)
json_filename = '/path/to/file/test.json'
argws = {
    'commit': 'true',
    'extractOnly': False,
    'Content-Type': 'application/json',
}
with open(json_filename, 'rb') as f:
    solr_instance.extract(f, **argws)
    solr_instance.commit()

using curl from the command line works as expected:

$ curl 'http://192.168.45.153:8983/solr/test/update?commit=true' \
     --data-binary @/path/to/file/test.json \
     -H 'Content-Type: application/json'

the file has following content:

$ cat /cygdrive/w/mist/test.json
-->    [{"x": "a","y": "b"}]

I'm using pysolr 3.6.0 and solr 6.5.0


Solution

  • The extract() method refers to a request made against the ExtractingRequestHandler, which is meant to be used for extracting content from rich documents (such as PDFs, etc.).

    You can use the regular .add method to submit the decoded JSON to Solr:

    import json
    
    solr.add(json.load(json_filename))
    

    .. should work.