Since I suspect my setup is rather conventional, I'd like to start by providing a little context. Our Solr setup involves three environments:
For the most part, this is working well. We have a lot of records so doing a full reindex takes a few hours (despite eager loading and using background jobs to parallelize the work). But that's not too terrible since we don't need to completely reindex very often.
But there's another scenario which is starting to become very annoying... We very frequently need to populate our local machine (or staging environment) with production data (i.e. basically grab a SQL dump from production and pipe it into our local database). We do this all the time for bugfixes and whatnot.
At this point, because our data has changed, our local Solr index is out of date. So, if we want our search to work correctly, we also need to reindex our local Solr server and that takes a really long time.
So now the question: Rather than doing a full reindex, I would like to simply copy the production index down on to my machine (i.e. conceptually similar to a SQL dump but for a Solr server rather than a database). I've Googled around enough to know that this is possible but have not seen any solutions specific to Websolr / Sunspot. These are such common tools that I figured someone else must have figured this out already.
Thanks in advance for any help!
One of the better kept secrets of Solr (and websolr): You can use the Solr Replication API to copy the data between two indices.
If you're making a copy of the production index "prod54321" into the QA index "qa12345", then you'd initiate the replication with the fetchindex
command on the QA index's replication
handler. Here's a quick command to approximate that, using cURL.
curl -X POST https://index.websolr.com/solr/qa12345/replication \
-d command=fetchindex \
-d masterUrl=https://index.websolr.com/solr/prod54321/replication
(Note the references to the replication
request handler on both URLs.)