I am trying to import a single column from a solr core into another core using DIH. Solr version is 6.4.0
My managed-schema file has the following entries:
<uniqueKey>journal</uniqueKey>
<field name="journal" type="text_general" multiValued="false" indexed="true" stored="true" />
<field name="fjournal" type="string" indexed="true" stored="false"/>
and also one copyField settings like below:
<copyField source="journal" dest="fjournal" />
In the solrconfig.xml, i configured the following elements:
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler>
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>
<updateRequestProcessorChain>
<processor class="solr.UniqFieldsUpdateProcessorFactory">
<str name="fieldName">journal</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
And the following is in the file "solr-data-config.xml"
<dataConfig>
<document>
<entity name="journalMaster" processor="SolrEntityProcessor"
url="http://localhost:8983/solr/journalMaster "
query="*:*"
fl="journal"/>
</document>
</dataConfig>
When I execute the import process, the values after the import has been completed, still holds the duplicated values.
{ "journal":"Journal of Immunology",
"_version_":1559554209274134528,
"fjournal":"Journal of Immunology"},
{
"journal":"Journal of Immunology",
"_version_":1559554209373749248,
"fjournal":"Journal of Immunology"},
{
"journal":"Journal of Immunology",
"_version_":1559554209375846400,
"fjournal":"Journal of Immunology"},
How do I avoid this from happening? I am importing the data from a local core to another core.
Any help will be really appreciated.
When defining a uniqueKey you don't need to analyse the content. Just have a string that will uniquely identify the documents. This unique identifier will be used across a lot of different Lucene/Solr functionality, so it is important to define it properly.
In your example I would use 'fjournal' as the unique key.
Then, there is nothing else to worry about, everytime you re-index the same fjournal, the Solr document will be overwritten, so you will end up with a single entry per value.
Probably a better curiousity would be to know why you need to index a single fielded document ...