solrdataimporthandler

How execute query after the finishing delta-import on Solr


I do delta dataimport. I use delete_item table for getting data, which I should delete from the solr index.

How can I execute query

TRUNCATE TABLE delete_item

after the executing delta import.

It can be do with solr or should I do this with cron jobs.


Solution

  • There is no out of the box, configure me in XML solution for this. Out of Solr's perspective this makes sense. Solr wants to manage itself and not manage other data sources. But you can do several things.

    Personally I would recommend (2) as this does not include to write custom code which needs to be deployed to your solr instance. Thus that solution is transferable to solr cloud.

    1. A Custom EventListener

    Like mentioned in this answer https://stackoverflow.com/a/9100844/2160152 to Solr - How can I receive notifications of failed imports from my DataImportHandler? you can write a custom EventListener. That listener may connect to your database and execute the truncate.

    import java.sql.Connection;
    import java.sql.SQLException;
    
    import org.apache.solr.handler.dataimport.Context;
    import org.apache.solr.handler.dataimport.EventListener;
    
    public class ImportEndListener implements EventListener {
    
        @Override
        public void onEvent(Context aContext) {
            Connection connection = getConnection();
            try {
                connection.createStatement()
                    .executeUpdate("TRUNCATE TABLE delete_item");
            } catch (SQLException e) {
                // TODO think of something better
                e.printStackTrace();
            } finally {
                try {
                    connection.close();
                } catch (SQLException e) {
                    // TODO think of something better
                    e.printStackTrace();
                }
            }
        }
    
        private Connection getConnection() {
            // TODO get a connection to your database, somehow
            return null;
        }
    
    }
    

    That listener needs to be compiled and bundled in a jar file. Then you need to make your jar and all its' dependencies available to Solr as described in the wiki (the article is about plugins, but holds true for any custom code).

    2. Redisign 'deleted_item' Table

    Like shown in the blog entry "Data Import Handler – removing data from index" you could extend your table by a timestamp column deleted_at. Then you would need to extend your onDelete trigger to insert the current time into that column.

    If you had that you could reformulate the deletedPkQuery attribute in your entity as follows

    deletedPkQuery="SELECT id FROM deleted_item WHERE deleted_at > '${dataimporter.last_index_time}'"
    

    That way there would be no need to truncate the table, except you want to save the disc space.