solrdih

Can Solr DIH do atomic updates?`


With Solr 4 came the ability to do atomic (partial) updates on existing documents within the index. I.e. one can match on the document ID and replace the contents of just one field, or add further entries to multivalued fields: http://wiki.apache.org/solr/Atomic_Updates

Can atomic updates be done from DataImportHandler (DIH)?


Solution

  • The answer is "yes" with the ScriptTransformer, I discovered through trial and error.

    The Solr documentation shows how to add an update attribute to a field node with "set", "add" or "inc". If I create a test XML file with the requisite update attribute, it works fine when passed to the regular update handler. But, when passed to DIH - even without any transformation - the update attributes get ignored completely.

    Here's a simplified version of the script transformer I used to reintroduce the update attribute and get atomic updates working. Note the use of the Java HashMap.

    var atomicTransformer = function (row) {
        var authorMap = new java.util.HashMap();
        var author = String(row.get('author'));
        authorMap.put('add', author);
        row.put('author', authorMap);
    };
    

    This produces the following JSON in DIH debug mode:

    {
        "id": [
            123
        ],
        "author": [
            {
                "add": "Smith, J"
            }
        ]
    }
    

    Multivalued fields are also no problem: pass in an ArrayList to the HashMap instead of a string.

    var atomicTransformer = function (row) {
        var fruits = new java.util.ArrayList();
        fruits.add("banana");
        fruits.add("apple");
        fruits.add("pear");
        var fruitMap = new java.util.HashMap();
        fruitMap.put('add', fruits);
        row.put('fruit', fruitMap);
    }