javahttpcachingapache-httpclient-4.x

Java Apache HTTP client: initialize persistent file cache


I am trying to configure persistent HTTP caching using the org.apache.http.impl.client.cache.CachingHttpClients builder. However, when I configure a cache directory, the cache never seems to be read back from disk.

I tried to setup persistent caching using setCacheDir, i.e.,

CachingHttpClients.custom()
  .setCacheDir(cacheDir)
  .setDeleteCache(false)
  .build();

(see below for a complete example)

The behaviour I'm seeing:

It seems that the cache entries that were written to disk are not being picked up after a restart, and I haven't been able to find a way to do so.

How do I initialize Apache's HTTP cache, so caching persists after restarts?


Minimal reproducible example. Running this multiple times results in a "Cache miss" every time, although there are cache entries being written to disk. I would expect reruns to use the cache that was written to disk. Note that I do see a cache hit if I perform two requests to the same URL within the same run.

    File cacheDir = Path.of(System.getProperty("java.io.tmpdir")).resolve("my-http-cache").toFile();
    if (!cacheDir.exists() && !cacheDir.mkdirs()) {
      throw new RuntimeException("Could not create cache directory " + cacheDir + ".");
    }
    try (var client = CachingHttpClients.custom()
      .setCacheDir(cacheDir)
      .setDeleteCache(false)
      .useSystemProperties()
      .build()) {
      HttpCacheContext context = HttpCacheContext.create();
      CloseableHttpResponse response = client.execute(new HttpGet("https://api.github.com/repos/finos/common-domain-model"), context);

      CacheResponseStatus responseStatus = context.getCacheResponseStatus();
      switch (responseStatus) {
        case CACHE_HIT:
          System.out.println("Cache hit!");
          break;
        case CACHE_MODULE_RESPONSE:
          System.out.println("The response was generated directly by the caching module");
          break;
        case CACHE_MISS:
          System.out.println("Cache miss!");
          break;
        case VALIDATED:
          System.out.println("Cache hit after validation");
          break;
      }
    }

Solution

  • Apache's HTTP caching will keep track of a cache entry for each eligible HTTP response. This cache entry points to a certain abstract "resource" object, which holds the cached response. By using CachingHttpClients.custom().setCacheDir(cacheDir), this resource will be a file, i.e., responses will be saved to disk, rather than kept in memory, which saves on memory usage. However, the cache entries themselves are still kept in-memory, so they will not survive a restart.

    The following implementation could be used to persist cache entries as well:

    /**
     * A variant of {@link org.apache.http.impl.client.cache.ManagedHttpCacheStorage}
     * that persists after start-ups.
     */
    @Contract(threading = ThreadingBehavior.SAFE)
    public class PersistentHttpCacheStorage extends ManagedHttpCacheStorage {
      private static final Logger LOGGER = LoggerFactory.getLogger(PersistentHttpCacheStorage.class);
      private static final String ENTRIES_FILE_NAME = "ENTRIES";
    
      private Map<String, HttpCacheEntry> entries;
      private final File cacheDir;
      private final File entriesFile;
    
      public PersistentHttpCacheStorage(CacheConfig config, File cacheDir) {
        super(config);
        this.cacheDir = cacheDir;
        this.entriesFile = new File(cacheDir, ENTRIES_FILE_NAME);
    
        // A hack to access the entries of the super class.
        try {
          Field f = ManagedHttpCacheStorage.class.getDeclaredField("entries");
          f.setAccessible(true);
          this.entries = (Map<String, HttpCacheEntry>) f.get(this);
        } catch (NoSuchFieldException | IllegalAccessException e) {
            throw new RuntimeException(e);
        }
      }
    
      public void initialize() {
        try {
          if (!cacheDir.exists() && !cacheDir.mkdirs()) {
            throw new RuntimeException("Could not create cache directory " + cacheDir + ".");
          }
          if (entriesFile.exists()) {
            try (ObjectInputStream in = new ObjectInputStream(new FileInputStream(entriesFile))) {
              Map<String, HttpCacheEntry> persistentEntries = (Map<String, HttpCacheEntry>) in.readObject();
              this.entries.putAll(persistentEntries);
              LOGGER.debug("Read " + this.entries.size() + " HTTP entries from cache.");
            }
          } else {
            LOGGER.debug("No cached entries exist. Creating a new file at " + entriesFile + ".");
            if (!entriesFile.createNewFile()) {
              throw new RuntimeException("Could not create entries file " + entriesFile + ".");
            }
          }
        } catch (IOException | ClassNotFoundException e) {
          throw new RuntimeException(e);
        }
      }
    
      private void writeEntries() throws IOException {
        synchronized (this) {
          try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(entriesFile))) {
            out.writeObject(entries);
          }
        }
      }
    
      @Override
      public void putEntry(String key, HttpCacheEntry entry) throws IOException {
        super.putEntry(key, entry);
        writeEntries();
      }
    
      @Override
      public HttpCacheEntry getEntry(String key) throws IOException {
        return super.getEntry(key);
      }
    
      @Override
      public void removeEntry(String key) throws IOException {
        super.removeEntry(key);
        writeEntries();
      }
    
      @Override
      public void updateEntry(String key, HttpCacheUpdateCallback callback) throws IOException {
        super.updateEntry(key, callback);
        writeEntries();
      }
    
      @Override
      public void shutdown() {
        super.shutdown();
        if (!entriesFile.delete()) {
          LOGGER.error("Could not delete entries file " + entriesFile + ".");
        }
      }
    }
    

    Usage:

        CacheConfig cacheConfig = CacheConfig.DEFAULT;
        File cacheDir = Path.of(System.getProperty("java.io.tmpdir")).resolve("my-http-cache").toFile();
        if (!cacheDir.exists() && !cacheDir.mkdirs()) {
          throw new RuntimeException("Could not create cache directory " + cacheDir + ".");
        }
        PersistentHttpCacheStorage storage = new PersistentHttpCacheStorage(cacheConfig, cacheDir);
        storage.initialize(); // Necessary for loading the persisted cache entries
        CloseableHttpClient client = CachingHttpClients.custom()
          .setCacheConfig(cacheConfig)
          .setHttpCacheStorage(storage)
          .setCacheDir(cacheDir)
          .setDeleteCache(false)
          .useSystemProperties()
          .build();