pythondjangosolrdjango-haystackpysolr

Failed to add documents to Solr: Solr responded with an error (HTTP 400) (django + haystack + solr)


I currently have Solr 4.2.0 working in production (set up around 2012). I have set up a new development environment where I upgraded all packages (Django 1.8.10, PySolr 3.4.0, Haystack 2.4.1) and set up Solr 5.5.0

In short

I have Solr running, my core/collection created with 'basic_configs' and it seems to work well, except that during indexing I get a lot of errors similar to these:

All documents removed.
Indexing 9604 contracts
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.22] unknown field 'status']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.70556] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.72059] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.73458] unknown field 'date_signed']

Looking at the id's, it seems most documents are fine, but frequent enough (the list goes on) these errors appear throughout all tables/indexes.

Eventually I followed this promising github project guide, but unfortunately it did not solve the problems for me.

What I did, step by step

  1. Succesfully installed Solr 5.5.0 (web interface working at
    localhost:8983), using this guide
  2. Created a collection called 'spng', using the following command: sudo su - solr -c '/opt/solr/bin/solr create -c spng -d basic_configs'
  3. Overwritten my solr.xml (/srv/spng/src/django-haystack/haystack/templates/search_configuration/solr.xml) with the solr.xml from the earlier mentioned github project guide
  4. Just to be sure I gave the solr.xml file 777 rights.

My settings.py has the following entry:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': 'http://localhost:8983/solr/spng',
        'DEFAULT_OPERATOR': 'AND',
        'INCLUDE_SPELLING': True,
    },
}
  1. I created a schema.xml (python manage.py build_solr_schema) and placed it in /var/solr/data/spng/conf/schema.xml
  2. Again, just to be sure I gave the schema.xml file also 777 rights.
  3. I used the curl command to reload the core: curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=spng&wt=json&indent=true'

The response was:

{
  "responseHeader":{
    "status":0,
    "QTime":300}}
  1. I also restarted uwsgi and solr just to make sure
  2. At this point I try to run the python manage.py rebuild_index command

I end up with the following errors, as mentioned before:

All documents removed.
Indexing 9604 contracts
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.22] unknown field 'status']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.70556] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.72059] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.73458] unknown field 'date_signed']

Does anyone have any idea what might be wrong? The indexing works without errors on my production server, running 4.2.0. Did I miss a setting or is Solr 5.5.0 causing these errors?


Solution

  • Special thanks to elyograg for helping me out on Solr's IRC channel (#solr on freenode).

    elyograg: if you're using the stock solrconfig.xml from basic_configs, then your schema is located in a file named "managed-schema" -- ALL example configs are using the managed schema by default as of 5.5.

    elyograg: put it (schema.xml contents) into managed-schema. You could potentially change the solrconfig.xml, but life will be easier for people trying to help you if you keep the defaults.

    In other words, instead of schema.xml, as of version 5.5 the schema file is called 'managed-schema' when creating a collection with basic_configs (in my case located in /var/solr/data//conf/managed-schema)

    After updating the file and reloading the core, indexing finished without errors.

    Be wary in future versions, because elyograg also noted:

    elyograg: It might also be a good idea to add the .xml extension. I don't think the lack of an extension is going to be much of a deterrent to hand-editing.

    So in the future it may be called managed-schema.xml