javaspringsolrsolrjsolrcloud

SolrJ deleteById does not deletes data in Solr


I've a Solr collection having 6 shards based on years - 2019 to 2024. I use this method to delete some documents in this collection :

invoke(() -> solrClient().deleteById(collectionName, ids ));

but this does not actually deletes the documents for corresponding Ids even after waiting for a day. However this below method works and deletes documents instantly.

invoke(() -> solrClient().deleteById(collectionName, ids, 1000 ));
         try {
             solrClient().commit(collectionName);
         } catch (SolrServerException e) {
             throw new RuntimeException(e);
         } catch (IOException e) {
             throw new RuntimeException(e);
         }

can someone please explain me what's going on here and what's the significance of commitWithinMs value that I'm using here as 1000. I'm not sure if should keep this value as 1000ms or increase it.

I'm using Solr version 8.9

I tried passing commitWithinMs parameter value as 1000 in deleteById method and did the commit at the same time and it worked but I thought Solr does autocommit and I can see Autocommit time passed in SolrConfig.xml

   <autoCommit>
            <!-- in ms, our setting is 10 min -->
            <maxTime>600000</maxTime>
            <maxDocs>100000</maxDocs>
            <openSearcher>false</openSearcher>
        </autoCommit>

Also just passing the commitWithinMs is not sufficient, I've to do the commit explicitly just after I invoke the deleteByID method


Solution

  • In your first example you're just submitting your query. Changes does not become permanent until a commit happens; they'll just be pending in memory and never be persisted to disk. If the server restarts before you issue a commit, the change is lost. It will not change what is returned from a search until a commit is issued (and a new searcher is opened, which becomes important further down).

    In the other two examples you do issue a commit, so your changes becomes visible. You do it in two different ways - one with commitWithin and one with an explicit commit.

    commitWithin tells Solr to automagically issue a commit if none has been issued within the time given - this is useful when using multiple clients for indexing content in parallel, so you don't issue commits from every client after every document, but still want updates to be visible within a certain timeframe. i.e. it's especially useful in a busy setting where multiple updates are being made within a short timeframe. If you only do updates with a low frequency, you can just issue a commit for every update, since there isn't going to be any performance penalty if the commit would have happened within a second anyway (and there isn't any other updates in that time frame).

    And the last issue when using autoCommit is down to:

    You have openSearcher set to false - so your autocommits doesn't cause a new searcher to be opened. This means the searcher (the module responsible for actually looking up your documents in the index based on your query) still is the old one that uses the old index; it never changes over to the changed index after your commit.

    From the reference guide:

    If this is false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible.

    So the changes will be persisted within ten minutes, but won't be visible until a new searcher is opened. That can happen by an explicit commit somewhere else, by an optimize, or by the Solr server being restarted.