18

I've a Java application that performs some aggregations on MongoDB, but sometimes it just hangs and throw a SocketTimeout exception. After the Exception the app will run just fine (for a bit, then it will probably raise again the exception).

I've just found this explanation that seems a possible cause but I'm not sure.

I initalize the MongoClient and keep the connection to the DB open. I'm not sure if this could be a problem and I should just get everytime the database and then let the database to be garbage collected (and close the connection).

Another approach could be ping periodically Mongo to keep the connection pool "fresh".

The client used is something like this:

public class DbClient {

    private static MongoClient mongoClient;
    private static MongoDatabase db;

    private DbClient() {}

    public static void init() throws Exception {
        mongoClient = new MongoClient();
    }

    public static MongoDatabase getDB() {
        if(mongoClient == null)
            throw new IllegalStateException("Client not initialized!");

        if(db == null) {
            db = mongoClient.getDatabase("my_db");
        }
        return db;
    }
}

Is this the possible cause of the SocketTimeout?

This is the exception thrown:

09:20:45.742 [qtp605535417-46] INFO  org.mongodb.driver.connection - Closed connection [connectionId{localValue:16, serverValue:6562}] to myapp.com:27017 because there was a socket exception raised by this connection.
09:20:45.743 [qtp605535417-46] ERROR myapp.service.Api - Error processing request
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
    at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:474) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:225) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:102) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:435) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:112) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation$1.call(FindOperation.java:487) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation$1.call(FindOperation.java:482) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:239) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:212) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation.execute(FindOperation.java:482) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation.execute(FindOperation.java:79) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.Mongo.execute(Mongo.java:772) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.Mongo$2.execute(Mongo.java:759) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.OperationIterable.iterator(OperationIterable.java:47) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.FindIterableImpl.iterator(FindIterableImpl.java:143) ~[mongo-java-driver-3.2.2.jar!/:na]
    at myapp.common.db.service.dao.AnalysisMongoImpl.getAnalysis(AnalysisMongoImpl.java:66) ~[common-0.2.0-SNAPSHOT.jar!/:na]
    at myapp.common.db.service.AnalysisServiceImpl.getAnalysis(AnalysisServiceImpl.java:31) ~[common-0.2.0-SNAPSHOT.jar!/:na]
    at myapp.aggregator.service.Api$1.handle(Api.java:88) ~[aggregator-0.2.0-SNAPSHOT.jar!/:na]
    at spark.webserver.MatcherFilter.doFilter(MatcherFilter.java:139) [spark-core-1.1.1.jar!/:na]
    at spark.webserver.JettyHandler.doHandle(JettyHandler.java:54) [spark-core-1.1.1.jar!/:na]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:179) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.Server.handle(Server.java:451) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:252) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:266) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:240) [jetty-io-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) [jetty-util-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) [jetty-util-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_95]
    at java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_95]
    at java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_95]
    at com.mongodb.connection.SocketStream.read(SocketStream.java:85) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221) ~[mongo-java-driver-3.2.2.jar!/:na]
    ... 34 common frames omitted
9
  • You need to post the exception, its message, and the stack trace. In your question. None of those member fields or methods should be static. Commented Feb 24, 2016 at 9:20
  • @EJP hm, why? I'm guessing that the getDB() could be the issue, but as I've read I should handle the MongoClient as a Singleton. Commented Feb 24, 2016 at 9:28
  • Client != Cursor && Client != Connection. Actually, the client maintains a connection pool. Using these connection, operations are executed on the server, like queries. In this case, a cursor is returned. In case the operation takes too long, a timeout is triggered and the connection closed server side. Which, from what I can see, is what happened. Commented Feb 24, 2016 at 9:31
  • @MarkusWMahlberg I don't think this is the case. I've a query timeout, and these are thrown also for not heavy aggregations. In case of long running queries the exception thrown is the correct one. collection.aggregate(pipeline).maxTime(maxTimeout, TimeUnit.SECONDS); and from the logs I can see that the exception is raised BEFORE the aggregation. Commented Feb 24, 2016 at 9:35
  • Well, I'd double-check the assumption that the aggregations should be fast enough by running them on the shell. A bad index combined with high load can wreak havoc on performance, where without load it was ok(ish). Commented Feb 24, 2016 at 9:39

3 Answers 3

24

After few tries I've found out that it was a problem with the Azure's Load Balancer.
After 60s of inactivity it will disconnect any pending TCP connection.

After further digging I've found this post of the MongoDB diagnostics FAQ, and I've set the tcp keepalive to 120s:

sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>

and I've also set the socketKeepAlive of the MongoClient to true:

MongoClientOptions.Builder options = MongoClientOptions.builder();
options.socketKeepAlive(true);
mongoClient = new MongoClient(mongoAddress, options.build());

After these fixes the issue seems gone!

Sign up to request clarification or add additional context in comments.

5 Comments

Where did you run this command? sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>
Directly on the server (linux)
socketKeepAlive is deprecated
deprecated in favor of what? your comment is half useful
It's deprecated in favour of simply not setting it. From the JavaDoc; "configuring keep-alive has been deprecated. It now defaults to true and disabling it is not recommended."
4

If you experience socket errors between clients and servers or between members of a sharded cluster or replica set that do not have other reasonable causes, check the TCP keepalive value (for example, the tcp_keepalive_time value on Linux systems). A common keepalive period is 7200 seconds (2 hours); however, different distributions and macOS may have different settings.

For MongoDB, you will have better results with shorter keepalive periods, on the order of 120 seconds (two minutes).

Where you have installed mongodb you have to simply run this command on Linux

sudo sysctl -w net.ipv4.tcp_keepalive_time=120

Reference: Does TCP keepalive time affect MongoDB Deployments?

4 Comments

I already have net.ipv4.tcp_keepalive_time = 120 on the host and I believe keep-alive setting defaults to true now. However I am still getting org.mongodb.driver.connection : Got socket exception on connection occasionally.
@NitishDeshpande well I do not remember right now anymore, since we tried many different stuff and I am also not with the company at the time anymore. If I am not mistaken, we did something like re-try logic, were when Azure would drop connection, we would reestablish it. I do not have a snippet, to demonstrate how we solved it.
@NitishDeshpande Were you able to resolve this issue?
This fixed the issue. Thanks.
2

I just met the same problem and solved it by increasing the spring.data.mongodb.socketTimeout=<value> in Spring Boot config. By default, the value is 0 and means no timeout. However, my colleague set it to 20000(20 seconds), so the MongoSocketReadTimeoutException will be thrown out as soon as my aggregate() runs longer than 20 seconds!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.