I've index data to my SOLR core using CURL command where the data was on CSV format. The command was curl 'http://localhost:8983/solr/my_collection/update?commit=true' --data-binary @my_docs.csv -H 'Content-type:application/csv'
The data is imported successfully but I got an issue with multiValued field.
On my .csv file the value for the multiValued field was like this "['parking','garden','spa']"
so now imported data on my solr core looks like with extra double quotes in below format.
"amenities": [
"['parking', 'garden', 'spa']"
^ ^
]
To remove that double quotes from my multiValued field I've tried this way from the Document section of SOLR ADMIN UI section and I succeeded to Atomic Update with this format of JSON
{
"id":"2118506",
"amenities":{"set":["parking", "garden", "spa""]},
}
I know I can Atomic Update all of the index document using this way by sending curl request on solr with SET but it is hard for me at this moment because I've 20M documents already indexed.
So I just want to know at this time is there any way I can remove the double quotes from multiValued field at query time or any smarter way to remove double quotes from field value with a single curl command without specifying individual document ids
N.B It is hard for me now to remove double quotes from each and every csv file and try to re-index documents
The reason for the double quotes is because your value is being indexed as a string - it's not being indexed as a multivalued field. The double quotes are there since that's how JSON indicates that we're talking about a string.
You'll need to change this when indexing your data, and you can use a few special arguments when indexing CSV:
f.amenities.split=true&f.amenities.separator=%2C
That way the values will be indexed as an actual multivalued field, by splitting the values from the field on ,
. If you have an actual JSON list in your CSV file, I strongly recommend removing [
, '
and ]
from the field as a preprocessing step.