azureazure-iot-hubazure-iot-edge

Azure IoTEdge - Custom IoTEdge module unable to connect to Hub with Communication_Error


Our custom IoTEdge module was functioning correctly until recently, when it started experiencing loss of connectivity, with our ConnectionStatusChangeHandler reporting status "ConnectionStatus.Disconnected" and reasons "ConnectionStatusChangeReason.Retry_Expired" and ConnectionStatusChangeReason.Communication_Error".

The handler triggers re-initialization of the client as per recommended practices, but the loss of connectivity is continuous, there are no intermittent successful connections to the hub that would indicate a transient condition.

We checked One thing that stands out is that the EdgeHub module twin of the device does not list the custom module in its reported properties' "clients" section, which seems to be related to the outage. Only the IoTEdgeMetricsCollector module is listed, which is connected to IoTHub and functioning normally.

  }, "clients": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "deviceId/IoTEdgeMetricsCollector": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "status": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" }, "lastConnectedTimeUtc": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" } } }

It seems that the local EdgeHub module is ignoring our custom module.

Environment

Steps to repro

Unfortunately we are unable to reproduce the issue. We are not sure what started it, but we suspect that an accidental change was saved to the edgeHub's module twin using Azure IoT Explorer.

What we tried so far :

The custom module is successfully deployed, starts up and runs, but fails to send telemetry downstream, since any attempt to connect to the EdgeHub fails with aforementioned ConnectionStatus and ConnectionStatusChangeReason.

HasStack="True" ThreadID="6,006" ProcessorNumber="0" thisOrContextObject="ErrorDelegatingHandler#34683734" memberName="ExecuteWithErrorHandlingAsync" message="Exception caught: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception. ---> System.IO.IOException: Received an unexpected EOF or 0 bytes from the transport stream. at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm) at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request) at System.Threading.Tasks.TaskCompletionSourceWithCancellation1.WaitWithCancellationAsync(CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options) at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options) at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.CreateClientWebSocketAsync(Uri websocketUri, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.CreateClientWebSocketTransportAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.InitializeAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpIotConnector.OpenConnectionAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.EnsureConnectionAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.OpenSessionAsync(IDeviceIdentity deviceIdentity, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpUnit.EnsureSessionIsOpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpUnit.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpTransportHandler.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.ProtocolRoutingDelegatingHandler.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.<>c__DisplayClass27_0.<<ExecuteWithErrorHandlingAsync>b__0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func1 asyncOperation)"

We thought to fully uninstall and reinstall the IoTEdge runtime on the device, but we would like to exhaust all other options before going there.

At this point we are running out of ideas on how to attack this issue, any help would be appreciated.


Solution

  • Solved. The issue was caused due to erroneously deleting the base deployment that puts the system modules on the device, then following up with layered/manual deployments. The edgeAgent and edgeHub modules were working, but would not see any newly deployed custom modules. Strangely enough, edgeHub would happily register other non-custom modules from the masrketplace, like Simulated Temperature Sensor. Restoring the base deployment brought everything back to working order.