javaspring-boot

Spring Boot projects storing data in memory options


My goal is to store bunch of uniqueIds(approx 500k) in memory when Spring Boot is started.

Trying to explore my options here.

  1. Can I store the uniqueIds in a data structure like HashSet? So this HashSet will be read from the beginning of the spring boot initialization.

  2. How about using a in-memory database like h2 or sqllite?

I feel like option 1 is easier to implement, it's 1 dependency less and also could save some memory. Not sure about any potential issues with option 1.

Any other viable options are greatly appreciated.

Thanks!


Solution

  • Storing 500k unique IDs in memory is not a good idea, even if resources seem abundant now. Memory should be used efficiently, especially for read-only data like this. Loading everything at startup introduces unnecessary latency, and JVM tuning will become painful over time, especially when you scale. This kind of design locks you into a rigid, brittle architecture, making future changes (like scaling across multiple instances) a headache.

    A better solution: Instead of loading all data into memory, use Redis as an in-memory data store. Redis is perfect for fast lookups like yours and is built for concurrent access. With Spring Boot 3.3.3 and Java 21, using Redis via Spring Data Redis is simple to integrate, allowing for quick setup and efficient querying.

    Add the dependency:

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    

    add this to application.properties:

    spring.redis.host=localhost
    spring.redis.port=6379
    

    maybe do something like this:

    @Autowired
    private RedisTemplate<String, String> redisTemplate;
    
    public void storeUniqueIds(Set<String> uniqueIds) {
        redisTemplate.opsForSet().add("uniqueIds", uniqueIds.toArray(new String[0]));
    }
    
    public boolean existsInSet(String id) {
        return redisTemplate.opsForSet().isMember("uniqueIds", id);
    }
    

    Redis will handle the concurrency, provide constant time lookups, and allow your app to scale easily across instances without eating into your heap memory. Plus, it’s built for read-heavy operations like yours.

    If you’re set on making this mistake and insist on in-memory storage, you could use H2 with a file-based storage solution to avoid reloading the data every time the app restarts (though, with a file, you’re no longer truly in-memory).

    dep:

    <dependency>
        <groupId>com.h2database</groupId>
        <artifactId>h2</artifactId>
        <scope>runtime</scope>
    </dependency>
    

    prop:

    spring.datasource.url=jdbc:h2:file:./data/uniqueIds
    spring.datasource.driverClassName=org.h2.Driver
    spring.datasource.username=sa
    spring.datasource.password=password
    spring.h2.console.enabled=true
    

    some code:

    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    public boolean idExists(String id) {
        String query = "SELECT COUNT(*) FROM ids WHERE id = ?";
        return jdbcTemplate.queryForObject(query, new Object[]{id}, Integer.class) > 0;
    }
    

    However, if you’re going this route, you’re effectively managing a database, so the claim that you’re storing it “in memory” is no longer valid. Redis is still superior for your use case, providing faster lookups, true in-memory performance, and better scalability.