aeron

Aeron stream/session becomes unavailable on one host and never recovers


I have an aeron stream over multicast, some hosts subscribed to the stream see new sessions but then have their images go unavailable very quickly. These hosts then never see those images become available again. Whilst other hosts successfully receive data from the same session.

On my aeron stream I have 3 hosts A, B and C that all publish and subscribe. When I start my publication from A, both B and C initially see it (confirmed via AVAILABLE_IMAGE events in the event log). However, after 11s host C reports the image as unavailable. It then never becomes available again, even with restarting the publication (the publication from that host reuses the same sessionid). In the meantime host B continues to receive data successfully from this session. Upon restarting the driver everything works correctly.

One difference between the subscription from C and the subscription from B, is that subscription C uses tether=true. It should also be noted that other streams sessions on the same multicast group published from host A and received by host B are working correctly so it doesn't seem to be a network issue.

I'd expect the images to become available again when they continue to see data for those sessions. However this never happens.

What could be causing the initial unavailable image, and why doesn't it eventually try recreating it?


Solution

  • This was only happening due to various bugs present in aeron 1.40.0 and prior. All causes of stuck sessions that I'm aware of have had fixes submitted as of today.