aeron

Aeron ReplayMerge never merges


My ReplayMerge gets stuck in state ATTEMPT_LIVE_JOIN, then times out due to no progress. It adds the live destination with no issues (I see the corresponding subscription appear in aeron-stat and the onImageAvailable callback is invoked). Eventually it catches up fully but doesn't transition to the next state.

After an investigation, I found that the problematic check is in function shouldStopAndRemoveReplay where image.activeTransportCount() >= 2 is false because image.activeTransportCount() stays at 1. If it weren't for that check the ReplayMerge would succeed.

Here are my ReplayMerge parameters:

replayChannel = "aeron:udp"
replayDestination = "aeron:udp?endpoint=localhost:0"
liveDestination = "aeron:udp?endpoint=localhost:0|control=localhost:12345"

I've tried both the Java client and the C++ client. What am I missing?

EDIT: aeron-stat on the client side gives looks like this:

 42:                    1 - rcv-local-sockaddr: 41 <some IP address>:54709
 43:          452,985,472 - sub-pos: 24 -106708072 3000 aeron:udp?control-mode=manual @0
 44:          452,985,472 - rcv-hwm: 28 -106708072 3000 aeron:udp?control-mode=manual
 45:          452,985,472 - rcv-pos: 28 -106708072 3000 aeron:udp?control-mode=manual
 46:                    1 - rcv-local-sockaddr: 41 0.0.0.0:39238
 47:          452,971,520 - sub-pos: 24 -106708098 3000 aeron:udp?control-mode=manual @452971520
 48:          452,985,472 - rcv-hwm: 89 -106708098 3000 aeron:udp?control-mode=manual
 49:          452,971,520 - rcv-pos: 89 -106708098 3000 aeron:udp?control-mode=manual

The first driver subscription is from the replayDestination. All the numbers go up as you would expect, like a normal replay.

The second one is from the added liveDestination. Once created it doesn't catch up at all, contrary to my initial assessment above. sub-pos and rcv-pos are stuck at the initial position of 452971520, but the rcv-hwm goes up together with the position of the replay subscription. Doesn't this indicate that data is being received but not read on the live destination subscription?

I noticed that the ReplayMerge#image is simply defined as

image = subscription.imageBySessionId((int)replaySessionId);

So I tried to instead poll the Subscription I passed to the ReplayMerge constructor so that both images would get polled internally. That did not help.


Solution

  • I fixed my issue (encountered with this code) by ensuring the replayChannel passed to the ReplayMerge is session ID-specific.

    File ReplayMergeTest.java in the aeron codebase does it with

    private final String publicationChannel = new ChannelUriStringBuilder()
        // ...
        .tags("1," + PUBLICATION_TAG)
        // ...
        ;
    
    private final String replayChannel = new ChannelUriStringBuilder()
        .media(CommonContext.UDP_MEDIA)
        .isSessionIdTagged(true)
        .sessionId(PUBLICATION_TAG)
        .build();
    

    so that the session ID of the replay channel is set to be that of the Publication associated to tag PUBLICATION_TAG. This works as well in the case where the publishing media driver and subscribing media driver are distinct but you still have to somehow communicate the Publication tag to the subscriber which might be inconvenient.

    So the solution I'll be going for is to take the session ID from the recording descriptor of the recording to be replayed, at the earlier point where I discover recordings with AeronArchive#listRecordingsForUri (or similar).

    This gist shows a working ReplayMerge across two media drivers.