python-2.7pysolr

pysolr update document with error


update: Pysolr Version: 3.2.0

This seems a bug in solr. when update nothing in an operation, it will delete this doc.

Former I used code in using pysolr in atomic update, but I made error in following case.

Now a document schema maybe like this:

doc = {
   'id':    ...,
   'title': ...,
   'body':  ...,
}

I have indexed a batch of docs and now I want to update every doc with a new field anchor_text. Here is my code:

solr = pysolr.Solr(url_solr)
doc_update = {
   'id': ...,
   'anchor_text': [a,b,c,...]
}
solr.add([doc_update], fieldUpdates={
    'anchor_text': 'set'
})

But I found some of original docs were removed only with id field left. Something like this after update:

doc = {
  'id':...
}

Especially, for those whose anchor_text field are empty lists, the original docs are removed. While others are not.(Probably I guess because I only see several cases).

I've looked at the source code but found nothing valuable. What's going on here?

What are the correct way to use pysolr in update document?


Solution

  • I came across the same issue (python-3.6, pysolr-3.6, solr 6.4.1). As I couldn't find any more information online, I used a requests workaround which I'll leave here in case it's of use to anyone.

    import requests
    import json
    
    def update_single_solr_field(doc_id_field, doc_id, field_update_name, field_update_value):
        # Updates a single field in a document with id 'doc_id'.
        # Updates only the 'field_update_name' field to the 'field_update_value', leaving other fields intact
    
        base_url = 'http://localhost:8983/'
        solr_url = 'solr/mysolrcore/'
        update_url = 'update?commit=true'
        full_url = base_url + solr_url + update_url
        headers = {'content-type': "application/json"}
    
        payload = [{
            doc_id_field: doc_id,
            field_update_name: {
                'set': field_update_value
            }
        }]
    
        response = requests.post(full_url, data=json.dumps(payload), headers=headers)
    
        return response
    
    # example
    id_field_name = 'id'
    doc_id_to_update = '1700370208'
    field_to_update = 'weight_field'
    field_to_update_value = 20000
    response_update = update_single_solr_field(id_field_name, doc_id_to_update, field_to_update, field_to_update_value)
    
    print(response_update)