grailsgrails-ormcriteriahibernate-criteriadetachedcriteria

GORM fetch and update in batches


I need to update a single property on a large set of data in Oracle database with grails 2.5

Right now my code looks similarly to this:

    List booksToUpdate = []
    boolean moreBooks = true
    int offset = 0

    while (moreBooks) {
        def fetchedBooks = []
        if('Europe'){
            fetchedBooks = Book.findAllByAuthorInList(getEuropeanAuthors(),
                    [max: BATCHSIZE, offset: offset])
        } else {
            fetchedBooks = Book.findAllByAuthorInListAndPublishYear(getEnglishAuthors(), '1999',
                    [max: BATCHSIZE, offset: offset])
        }

        booksToUpdate.addAll(fetchedBooks)

        moreBooks = fetchedBooks.size() == BATCHSIZE
        offset += BATCHSIZE
    }

    booksToUpdate.each { book ->
        book.copyright = '2020'
        book.save(flush: true, failOnError: true)
    }

I would like to batch my updates for performance. Also, findAll query differs very slightly and it would be nice to build search criteria condifitionally. Ideally I want something like this:

    while (moreBooks) {
        def fetchedBooks = []

        def criteria = new DetachedCriteria(Book)
        criteria.build [max: BATCHSIZE, offset: offset] {
            List relevantAuthors = []
            if('Europe') {
                relevantAuthors = getEuropeanAuthors()
                eq 'publishYear', '1999'
            } else {
                relevantAuthors = getEnglishAuthors()
            }
            inList 'author', relevantAuthors
        }
        criteria.updateAll(copyright:'2020') //batch update

        moreBooks = fetchedBooks.size() == BATCHSIZE
        offset += BATCHSIZE
    }

Is there a way to do this? Doesn't have to be with DetachedCriteria. I looked at the guide but I can't find anything about passing max and offset. Is there a better way to make the code more elegant without compromising its performance?


Solution

  • updateAll updates everything, even if default max and offset are set. And I realized I need to get a list of updated instances back. So trying to force everything into a single updateAll that returns only count didn't make sense. Ended up with this:

    List updateCopyright(String geo) {
        DetachedCriteria<Book> findCr = getQueryFromGeo(geo)
        
        List<String> updatedBookIds = []
        boolean moreBooks = true
        int offset = 0
        
        while (moreBooks) {
            List<Book> fetchedBooks = findCr.list([max: BATCHSIZE, offset: offset])
            List currentBatchIds = fetchedBooks*.id
    
            def updateCr = Book.where {
                inList 'id', currentBatchIds
            }
            int updatedCount = updateCr.updateAll(copyright: '2020')
    
            moreBooks = updatedCount == BATCHSIZE
            offset += BATCHSIZE
    
            updatedBookIds.addAll(currentBatchIds)
        }
    
        updatedBookIds
    }
    
    
    private DetachedCriteria<Book> getQueryFromGeo(String geo) {
        switch (geo) {
            case 'Europe':
                return Book.where {
                    author in getEuropeanAuthors()
                }
            case 'England':
                return Book.where {
                    author in getEnglishAuthors() &&
                            publishYear == '1999'
                }
            default:
                return Book.where {}
        }
    }
    

    Also played with building my query condifionally with createCriteria but readability suffered and no compile-time check that comes with where queries