python-3.xsolrfieldpysolrsolr-schema

Solr Question about Loading Changes to Schema


I'm new to Solr and received the following error when adding a document through pysolr:

pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: ERROR: [doc=bc4aa768-6f35-4888-80e0-1578d9971b3c] Error adding field 'periodical_nlm'='2984692R' msg=For input string: "2984692R"]

I ended up finding out that the first periodical_nlm value added was 404536.0, so I assumed it was a type issue. In Python I then cast every periodical_nlm explicitly to string before adding 2984692R. However, the error persisted.

I Googled a bit and found that I should probably explicitly tell Solr that I want that field to be a string. I've not gotten very "hands on" with the schema yet, so I just had some questions:

(1) There appear to be two schema files: managed-schema in the directory for the core and managed-schema in the conf folder of the core. I'm assuming that the initialized schema which is in use is the one in the conf folder?

(2) Which do I update in order for things to proceed smoothly? I attempted adding the following to the schema file in the core directory but the error persisted:

field name="periodical_nlm" type="string" indexed="true" stored="true" required="false" multiValued="false" />

Do I need to rerun some initialization process or add something to the conf file separately?

Thank you so much and please let me know if you need more info. I'm running on a Windows 10 Home x64 platform (not sure if that's important if there are any command-line things I need to run...).


Solution

  • As long as you reload the core after changing the managed-schema file under conf, you should be fine. Be aware that you should do this before indexing content - so you might need to clean out the index by deleting everything, then changing the schema and re-indexing your content. Changing the schema does not change content that has already been indexed.

    Otherwise your assumption is correct, and the schemaless mode (where the type is determined by the format of the first value submitted (not the type - as that's usually not included in any way, all values are just strings when being submitted, so Solr attempts to guess the type by applying a hierarchy of pattern matching)) is useful for prototyping - when you're moving to production you should always define the schema explicitly to avoid issues like you've seen here.