We have a Scala server which uses the Java MongoDB driver as wrapped by Casbah. Recently, we switched its database over from an actual MongoDB to Azure CosmosDB, using the Mongo API. This is generally working fine, however every once in a while a call to Cosmos fails with a MongoSocketWriteException (stack trace below).
We're creating the client as
import com.mongodb.casbah.Imports._
val mongoUrl = "mongodb://username:password@host.documents.azure.com:10255/?ssl=true&replicaSet=globaldb"
val client = MongoClient(MongoClientURI(mongoUrl))
val collection: MongoCollection = client("mongoDatabase")("mongoCollection")
We tried removing &replicaSet=globaldb
from the connection URI as per the suggested workaround for this seemingly similar bug (How to solve MongoError: pool destroyed while connecting to CosmosDB), but it didn't fix the problem.
Stack trace:
com.mongodb.MongoSocketWriteException: Exception sending message
at com.mongodb.connection.InternalStreamConnection.translateWriteException(InternalStreamConnection.java:462)
at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:205)
at com.mongodb.connection.UsageTrackingInternalConnection.sendMessage(UsageTrackingInternalConnection.java:95)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.sendMessage(DefaultConnectionPool.java:424)
at com.mongodb.connection.CommandProtocol.sendMessage(CommandProtocol.java:209)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:111)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112)
at com.mongodb.operation.CountOperation$1.call(CountOperation.java:210)
at com.mongodb.operation.CountOperation$1.call(CountOperation.java:206)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:203)
at com.mongodb.operation.CountOperation.execute(CountOperation.java:206)
at com.mongodb.operation.CountOperation.execute(CountOperation.java:53)
at com.mongodb.Mongo.execute(Mongo.java:772)
at com.mongodb.Mongo$2.execute(Mongo.java:759)
at com.mongodb.DBCollection.getCount(DBCollection.java:962)
at com.mongodb.DBCursor.count(DBCursor.java:670)
at com.mongodb.casbah.MongoCollectionBase.getCount(MongoCollection.scala:496)
at com.mongodb.casbah.MongoCollectionBase.getCount$(MongoCollection.scala:488)
at com.mongodb.casbah.MongoCollection.getCount(MongoCollection.scala:1106)
at com.mongodb.casbah.MongoCollectionBase.count(MongoCollection.scala:897)
at com.mongodb.casbah.MongoCollectionBase.count$(MongoCollection.scala:894)
at com.mongodb.casbah.MongoCollection.count(MongoCollection.scala:1106)
[snip]
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at com.mongodb.connection.SocketStream.write(SocketStream.java:75)
at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:201)
... 38 common frames omitted
(Posting this with an answer because I'm hoping the solution will be useful for others, and because I'd welcome any further insight.)
The problem went away after we added &maxIdleTimeMS=1500000
to the connection URI in order to set the maximum connection idle time to 25 minutes.
The cause seems to be a timeout of 30 minutes for idle connections on the Azure server, while the default behaviour for Mongo clients is no idle timeout at all. The server does not communicate the fact that it is dropping an idled connection back to the client, so that the next attempt at using it fails with the above error. Setting the maximum connection idle time to a value less than 30 minutes makes our server close idle connections before the Azure server kills them. Some sort of keep-alive or check before using a connection would probably also be possible.
I haven't actually been able to find any documentation about this or other references to this problem for CosmosDB, although it may be caused by or related to the 30 minute idle timeout for TCP connections for Azure Internal Load Balancers (see e.g. https://feedback.azure.com/forums/217313-networking/suggestions/18823588-increase-idle-timeout-on-internal-load-balancers-t).