1

My application needs to fetch resources (images, CSS, fonts, etc.) from given URLs and cache them locally based on the Cache-Control/ETag headers returned with the resource.

I’m using Apache HttpClient 5 with the cache module:

<dependency>
  <groupId>org.apache.httpcomponents.client5</groupId>
  <artifactId>httpclient5</artifactId>
  <version>5.3.1</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents.client5</groupId>
  <artifactId>httpclient5-cache</artifactId>
  <version>5.3.1</version>
</dependency>

Apache HttpClient successfully caches resources locally and fetches a new version once the old one has expired. However, old versions of the resources are not removed automatically from the local cache.

Here’s my test implementation:

@Component
public class ResourceCache {

  private final CloseableHttpClient client;

  public ResourceCache() {
        CacheConfig cacheConfig = CacheConfig.custom()
        .setMaxCacheEntries(2)
        .setMaxObjectSize(10 * 1024 * 1024)
        .setSharedCache(false)
        .setHeuristicCachingEnabled(true)
        .setHeuristicDefaultLifetime(TimeValue.ofMinutes(2))
        .build();
    ManagedHttpCacheStorage storage = new ManagedHttpCacheStorage(cacheConfig);

    client = CachingHttpClients.custom()
        .setCacheDir(new File("/my_cache"))
        .setCacheConfig(cacheConfig)
        .setHttpCacheStorage(storage)
        .build();

    ScheduledExecutorService ses = Executors.newSingleThreadScheduledExecutor();
    ses.scheduleAtFixedRate(storage::cleanResources, 30, 30, TimeUnit.SECONDS);
  }

  public void fetchAndCache() {
      List<String> resources = List.of(
          "https://img.shields.io/npm/v/react.svg",   // cache-control: max-age=300, s-maxage=300; no ETag
          "https://jpeg.org/images/jpeg-home.jpg",   // ETag only
          "http://httpbin.org/image/png"             // no cache-control, no ETag
      );

    for (String resource : resources) {
        HttpGet request = new HttpGet(resource);
        HttpClientResponseHandler<byte[]> handler = response -> {
          if (response.getEntity() != null) {
            return response.getEntity().getContent().readAllBytes();
          }
          return new byte[0];
        };

        try {
          byte[] data = client.execute(request, handler);
        } catch (IOException e) {
          e.printStackTrace();
        }
    }
  }
}

Observed behavior:

  • https://img.shields.io/npm/v/react.svg → Has Cache-Control: max-age=300. After it expires, a new version is fetched, but the old one still exists. (Also, for some reason, this specific resource seems to be fetched twice each time.)
  • https://jpeg.org/images/jpeg-home.jpg → Has an ETag. A new version is not fetched (since the ETag hasn’t changed). Expected.
  • http://httpbin.org/image/png → No cache-control/ETag. Apache client applies heuristic caching. After expiration, a new version is fetched, but again, the old version remains in the cache.

I’m creating my own ManagedHttpCacheStorage so I can call cleanResources() on it periodically from a scheduled job.However, cleanResources() does not remove files from the cache folder.

The Documentation says that this type of storage can deallocate resources. Does this mean it only removes references from memory but leaves the files on disk?

Is there a way to automatically clean expired resources from disk with Apache HttpClient 5? Or do I need to implement my own cleanup logic?

Additionally, I noticed that if I manually delete the cache folder while the application is still running, resources are no longer cached and are fetched from the web each time.

Cache folder content

1

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.