I'm making a script in spark java. I need to insert data (from a DataFrame) into a Solr collection using Lucidworks - spark-solr tools (https://github.com/lucidworks/spark-solr)
My schema.xml :
<schema name="MY_NAME" version="1.6">
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false" />
<field name="ignored_id" type="ignored" />
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="age" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="height" type="tlong" indexed="true" stored="true" required="false" multiValued="false" />
<field name="name " type="string" indexed="true" stored="true" required="false" multiValued="false" />
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0" />
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" />
<fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<uniqueKey>id</uniqueKey>
</schema>
My DataFrame :
DataFrame df = sqlContext.sql("SELECT id, age, height, name FROM TABLE");
df.show() gives :
+--------------------+-----------+------+------+
| id| age|height|name |
+--------------------+-----------+------+------+
|12345678912345678...| 10| 101|hello|
But when I try to insert in my solr collection with :
df.write()
.format("solr")
.option("collection", MY_COLLECTION)
.option("zkhost", MY_ZKHOST)
.save()
I have the following error :
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://MY_IP/solr/MY_COLLECTION_SHARD_REPLICA: ERROR :[doc=123456789123456789] unknown field '_indexed_at_tdt'
I don't understand where the field "_indexed_at_tdt" comes from.
The DataFrame seems correct with only the 4 fields I want to insert, but I still can't insert in my Solr collection because of this unknown field "_indexed_at_tdt".
More informations : I have a HBase Indexer which insert in the same collection and is working.
Thanks in advance for you help !
as you can see here it seems that field is automatically added by Lucidworks code.
You should just add the correspondent field to the schema and it will work:
<field name="_indexed_at_tdt" type="tdate" indexed="true" stored="true" required="false" multiValued="false" />
Or,if you prefer make it dynamic for *_tdt.