javaspring-bootspring-cloud-sleuthmdcmicrometer-tracing

Baggage is shared between threads after Spring Boot 3 migration


While migrating from Sleuth to Micrometer, I found a race condition on the update of the MDC context map (complete example can be found here) that boils down to a wrong implementation of the ExecutorService responsible for the submitted tasks.

The core logic of the code was unchanged:

ExecutorService executorService = buildExecutorService(threads);
List<CompletableFuture<O>> futures = inputList.stream().map(
    input -> CompletableFuture.supplyAsync(() ->
        businessLogic.apply(input), executorService))
    .toList();
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).get();

where the MDC should be updated in the businessLogic

public Object businessLogic(Object input) {
    BaggageField.getByName(MY_BAGGAGE).updateValue(String.valueOf(input));
    // ...actualBusinessLogic...
}

The main change (well, besides the update!) was on how the ExecutorService was built:

//Spring Boot 2 with Sleuth
private TraceableExecutorService produceExecutor(int threads, String spanName) {
    return new TraceableExecutorService(this.beanFactory,
        Executors.newFixedThreadPool(threads), spanName);
}

//Spring Boot 3 with Micrometer
private ExecutorService produceExecutor(int threads) {
    ContextRegistry.getInstance().registerThreadLocalAccessor(new ObservationAwareSpanThreadLocalAccessor(this.tracer));
    return ContextExecutorService.wrap(
        Executors.newFixedThreadPool(threads), ContextSnapshot::captureAll);
}

The issue is that while TraceableExecutorService creates a new span for each invocation of the business logic, the ContextExecutorService not only propagates/copies, but shares the context between threads (but the MDC is thread-bound, and fun race conditions are ensured!).

Is there a proper way to replicate the old Sleuth behavior (one span for each invocation or thread) without reimplementing from scratch the TraceableExecutorService? The migration docs don't seem that helpful.

Relevant Github issue here.


Solution

  • Currently, the only solution is to reimplement TraceableExecutorService/TraceCallable/TraceRunnable, reworking the code from Sleuth, as suggested from the Github issue. No out-of-the-box solution is provided in Micrometer.