spring-bootjpacrud

spring boot saveAll() hangs when called from @Scheduled, but works if called from a controller


saveAll(myEntities) just hangs if my service is called from a @Scheduled job sub module, but works if called from a @RestController submodule.

Other @Schuled jobs in the same module work.

However, there is no discernible difference, and I don't know how to debug it or to fix it.

There are some similar posts: Spring boot repository does not save to the DB if called from scheduled job But these have different code and their solution is not applicable.

The project has this structure:

parent (no code)
- api-service
- schedule-service
- common

The Services and Repositories are in common.

The main method of the schedule-services is:

@SpringBootApplication(scanBasePackages = { "com.xx.yyy"})
@EntityScan("com.xx.yyy")
@EnableFeignClients
@EnableScheduling
public class Main {
    public static void main(final String[] args) {
        SpringApplication.run(Main.class, args);
    }
}

Where all our code is in packages under com.xx.yyy

Datasource is very standard.

The service service looks like this:

package com.xx.yy.service

@Service
public class MyServiceImpl implements MyService

    @Autowired
    MyEntityRepository myEntityRepository;

    public int importStuff() {
        do {
            List<MyEntity> entities = readStuff();
            log.info("About to save " + entities.size());
            myEntityRepository.saveAll()
            log.info("saved");
        } while (moreStuff..)
    }
}

The repository looks like this:

package com.xx.yyy.dao
@Repository
public interface MyEntityRepository extends JpaRepository<MyEntity, MyEntityCompositeId> {
}

I also tried this:

package com.xx.yy.service

@Service
public class MyServiceImpl implements MyService

    @Autowired
    MyEntityRepository myEntityRepository;

    public void importStuff() {
        do {
            List<MyEntity> entities = readStuff();
            log.info("About to save " + entities.size());
            mySaveAll(entities)
            log.info("saved");
        } while (moreStuff..)
    }

    @Transactional
    public void mySaveAll(List<MyEntity> entities) {
        myEntityRepository.saveAll(entities);
        myEntityRepository.flush()
    }
}

The scheduler looks like this:

package com.xx.yyy.scheduler;
@Component
public class MyScheduler {
  @Autowired
  private MyService myService;
  
  @Async
  @Schuled(cron="xxx")
  public void myJob() {
     myService.importStuff();
  }
}

There are no exceptions, nothing I can debug, it just hangs when it gets to saveAll. Any suggestions of what I can try?

The service works perfectly when tested from an API controller. It saves chunks (i.e. not one transaction for all saves), which I can see by doing a select count on the table its inserting batches into periodically as its running.

Is there an easy way I can create and use an entity manager, if the one provided is broken some how, without having to write my own version of saveAll (i.e. still use the repository and CRUD it provides?)

Spring Boot 2.7, java 8.

FYI It works perfectly if called from a contoller in the api module.

the api modules main class is this:

package com.xx.yyy;

@SpringBootApplication(scanBasePackages = { "com.xx.yyy"})
@EntityScan("com.xx.yyy")
@EnableFeignClients
@EnableScheduling
public class Main {
    public static void main(final String[] args) {
        SpringApplication.run(Main.class, args);
    }
}

and the controller looks like this:

package com.xx.yyy
@RestController
@RequestMapping("/loadStuff")
public class MyController {

  @Autowired
  private MyService myService;

    @GetMapping("/import")
    public int loadStuff() throws Exception {
         myService.importStuff();
    }
}

It works fine when the service is called from the controller, but hags on saveAll when called from the scheduled job.

At this point I am desperate, maybe there is a way to manually create my own entity manager, transaction manager etc, as a workaround? I don't really know how to do this and still use the CRUD stuff SB provides.

=== Update 1 ===

I have been trying various combinations, and notice that if I modify the service to save just the first item using save() rather than saveAll(), it actually saves (and commits) that record to the database without issue:

@Service public class MyServiceImpl implements MyService

@Autowired
MyEntityRepository myEntityRepository;

public void importStuff() {
    do {
        List<MyEntity> entities = readStuff();
         
        log.info("saving " + entities.get(0));  // this works
        myEntityRepository.save(entities.get(0)); // this works
        log.info("saved " + entities.get(0));  // This works


        log.info("About to save " + entities.size());
        mySaveAll(entities)  // this hangs.
        log.info("saved");  // this never gets reached by the code.
    } while (moreStuff..)
}

So why would save work, and saveAll only work if the same code is called by a controller and not a @scheduled, given both have the same datasource and app config?


Solution

  • I found the solution, and it was quite embarrassing.

    The reason the job was not running on our staging env, is because there was another @Scheduled job scheduled at the same minute, and the scheduler is single threaded, and if two start at similar times, one will be discarded.

    The second issue was that if I run the app on my local PC pointing to my local PC's Postgres, inserting a batch of 25,000 records using saveALL() takes around 3 seconds. Running the same app on our staging env which is AWS ECS + RDS Postgres takes about 7s.

    However running on my local PC, pointing to a RDS Postgres through our corporate VPN, the same saveAll() for a batch size of 25k takes a 8 hours, making it appear to hang.

    A better solution might be to do a bulk insert (rather than many individual inserts in a loop inside a transaction like saveAll() does) but I havent found a resource which definitively says what the (practical) size limitation of a bulk insert is for Postgres. e.g. if I do this

    insert into myTable (col1, col2, col3, .. coln) values (val1..valn),(va2..valn2),(va3n..valn3)...(valX..valX*n)

    How do I calculate the maximum number of rows I can insert this way (i.e. what is the value of X? Some posts say its based on a maximum size of the overall sql statement 2147483648 chars. This implies if a list of values for one row is around 2k, the max I can insert is around 1m rows in a single statement, which begs the question why doesn't saveAll() work like this in the first place?