javaspringspring-data-jpaspring-batchlazy-loading

How to avoid LazyInitializationException when accessing nested lazy collections in Spring Batch with parallel chunk processing?


I’m working on a Spring Batch job with parallel chunk processing. The problem I’m facing is a LazyInitializationException due to nested lazy-loaded collections in my JPA entities.

I’m using JpaPagingItemReader for reading, JpaTransactionManager for managing transactions, and SimpleAsyncTaskExecutor for parallel processing.

Setup:

Simple example:

Customer entity:

@Entity
public class Customer {
    
    @Id
    private Long id;
    
    @ManyToOne
    @JoinColumn(name = "order_id")
    private Order order;
}

Order entity:

The @OneToMany relation is FetchType.LAZY by default.

@Entity
public class Order {

    @Id
    private Long id;

    @OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
    private List<Type> types;

    // Other @OneToMany collection fields...

}

Spring Batch Step Configuration:

I’m using a JpaPagingItemReader and an ItemProcessor. The step is parallelized using a SimpleAsyncTaskExecutor.

@Bean
public JpaPagingItemReader<Customer> customerItemReader(EntityManagerFactory entityManagerFactory) {
    return new JpaPagingItemReaderBuilder<Customer>()
            .name("customerItemReader")
            .entityManagerFactory(entityManagerFactory)
            .queryString("SELECT c FROM Customer c")
            .pageSize(100)
            .build();
}

@Bean
public Step processCustomersStep(JobRepository jobRepository,
                         PlatformTransactionManager transactionManager,
                         JpaPagingItemReader<Customer> customerItemReader,
                         ItemProcessor<Customer, Customer> customerItemProcessor,
                         ItemWriter<Customer> customerItemWriter,
                         TaskExecutor taskExecutor) {
    return new StepBuilder("processCustomersStep", jobRepository)
            .<Customer, Customer>chunk(100, transactionManager)
            .reader(customerItemReader)
            .processor(customerItemProcessor)
            .writer(customerItemWriter)
            .taskExecutor(taskExecutor)
            .build();
}

When running the batch job, I get the following error when accessing the lazy-loaded order or its nested types collection in the ItemProcessor:

org.hibernate.LazyInitializationException: failed to lazily initialize a collection of role: com.myapp.Order.types: could not initialize proxy - no Session

What I’ve tried:

  1. Fetching with JOIN FETCH: I can’t use JOIN FETCH because the Order entity contains multiple nested lazy-loaded collections, leading to large, inefficient queries.
  2. Since I am using JpaTransactionManager, I assumed it would manage the transaction scope properly, but the LazyInitializationException still occurs when accessing the nested lazy-loaded collections in the ItemProcessor.

Question:

How can I prevent the LazyInitializationException in a Spring Batch step with parallel chunks when using JpaPagingItemReader and JpaTransactionManager?

Should I handle the EntityManager differently, or is there another pattern to process entities with nested lazy-loaded collections in parallel?

UPDATE (processor, writter and JPA transaction manager)

Item processor:

In the ItemProcessor, I’m creating a new Customer instance and copying the nested Order and its associated Types. This requires accessing the lazy-loaded collections in the ItemProcessor, which is where the LazyInitializationException occurs.

@Component
public class CustomerItemProcessor implements ItemProcessor<Customer, Customer> {

    @Override
    public Customer process(Customer customer) throws Exception {
        Customer newCustomer = new Customer();
        Order order = new Order();
        order.setTypes(customer.getOrder().getTypes()));
        newCustomer.setOrder(order);
        return newCustomer; 
    }

}

Item writer:

The ItemWriter uses the injected EntityManager to persist the processed entities.

@Bean
public JpaItemWriter<Customer> customerItemWriter(EntityManagerFactory entityManagerFactory) {
    return new JpaItemWriterBuilder<Customer>()
        .entityManagerFactory(entityManagerFactory)
        .build();
}

JPA transaction manager:

The JpaTransactionManager is used to manage the transaction scope within the batch job.

    @Bean
    public PlatformTransactionManager transactionManager(EntityManagerFactory entityManagerFactory) {
        return new JpaTransactionManager(entityManagerFactory);
    }

Solution

  • The LazyInitializationException occurs because the ItemProcessor runs in a separate thread after the JPA transaction is complete — meaning the persistence context is closed and lazy collections can’t be initialized.

    To solve this, I implemented a custom paging ItemReader that eagerly loads the required collections during the read phase, inside the transaction.

    public class CustomerItemReader extends AbstractPagingItemReader<Customer> {
    
        private final EntityManagerFactory entityManagerFactory;
    
        private EntityManager entityManager;
    
        public CustomerItemReader(EntityManagerFactory entityManagerFactory, int pageSize) {
            super();
            this.entityManagerFactory = entityManagerFactory;
            setPageSize(pageSize);
            setName("customerItemReader");
        }
        
        @Override
        public void afterPropertiesSet() throws Exception {
            super.afterPropertiesSet();
            Assert.state(entityManagerFactory != null, "EntityManager cannot be null");
        }
    
        @Override
        protected void doOpen() throws Exception {
            super.doOpen();
            entityManager = entityManagerFactory.createEntityManager();
            if (entityManager == null) {
                throw new DataAccessResourceFailureException("Unable to obtain an EntityManager");
            }
        }
    
        @Override
        protected void doReadPage() {
            EntityTransaction tx = entityManager.getTransaction();
            tx.begin();
    
            entityManager.flush();
            entityManager.clear();
    
            if (results == null) {
                results = new CopyOnWriteArrayList<>();
            } else {
                results.clear();
            }
    
            // Step 1: Fetch Customers and their Orders
            List<Customer> customers = entityManager.createQuery("""
                    SELECT c FROM Customer c
                    LEFT JOIN c.order
                    ORDER BY c.id
                """, Customer.class)
                .setFirstResult(getPage() * getPageSize())
                .setMaxResults(getPageSize())
                .getResultList();
    
            if (customers.isEmpty()) {
                tx.commit();
                return;
            }
    
            // Step 2: Fetch nested Order collections using IDs
            List<Long> orderIds = customers.stream()
                .map(Customer::getOrder)
                .map(Order::getId)
                .distinct()
                .toList();
    
            // Step 3: Load Order.types
            entityManager.createQuery("""
                    SELECT DISTINCT o FROM Order o
                    LEFT JOIN FETCH o.types
                    WHERE o.id IN :orderIds
                """, Order.class)
                .setParameter("orderIds", orderIds)
                .getResultList();
    
            // Repeat step 3 to load other @OneToMany collections one by one to avoid MultipleBagFetchException...
    
            tx.commit();
    
            results.addAll(customers);
        }
    
        @Override
        protected void doClose() throws Exception {
            entityManager.close();
            super.doClose();
        }
    }
    

    This approach:


    Related: If you’re using an aggregate ItemReader that returns a List<T> per read (such as AggregatePagingItemReader), you may run into similar LazyInitializationException issues when accessing nested lazy-loaded collections in parallel chunk processing.

    I’ve written a separate answer that shows how to eagerly load nested @ElementCollection fields using separate batch queries to avoid LazyInitializationException, MultipleBagFetchException, and N+1 problems in that scenario.