23

Production environment is on Azure, using Redis Cache Standard 2.5GB.

Example 1

System.Web.HttpUnhandledException (0x80004005): Exception of type 'System.Web.HttpUnhandledException' was thrown. ---> StackExchange.Redis.RedisTimeoutException: Timeout performing SETNX User.313123, inst: 49, mgr: Inactive, err: never, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName: PRD-VM-WEB-2, serverEndpoint: Unspecified/Construct3.redis.cache.windows.net:6380, keyHashSlot: 15649, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=1,Free=32766,Min=1,Max=32767) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line 2120 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:line 81

Example 2

StackExchange.Redis.RedisTimeoutException: Timeout performing GET ForumTopic.33831, inst: 1, mgr: Inactive, err: never, queue: 2, qu: 0, qs: 2, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName: PRD-VM-WEB-2, serverEndpoint: Unspecified/Construct3.redis.cache.windows.net:6380, keyHashSlot: 5851, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=1,Free=32766,Min=1,Max=32767) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line 2120 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:line 81 at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisDatabase.cs:line 1647 at C3.Code.Controls.Application.Caching.Distributed.DistributedCacheController.Get[T](String cacheKey) in C:\Construct.net\Source\C3Alpha2\Code\Controls\Application\Caching\Distributed\DistributedCacheController.cs:line 115 at C3.Code.Controls.Application.Caching.Manager.Manager.Get[T](String key, Func`1 getFromExternFunction, Boolean skipLocalCaches) in C:\Construct.net\Source\C3Alpha2\Code\Controls\Application\Caching\Manager\Manager.cs:line 159 at C3.PageControls.Forums.TopicRender.Page_Load(Object sender, EventArgs e) in C:\Construct.net\Source\C3Alpha2\PageControls\Forums\TopicRender.ascx.cs:line 40 at System.Web.UI.Control.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

These errors are sporadic, several times a day.

Is this an Azure network blip, or something I can reduce? Looking at the numbers in the error doesn't seem anything out of the ordinary, and the server load never seems to go above 7% as reported by Azure.

Redis connection

internal static class RedisController
{
    private static readonly object GetConnectionLock = new object();
    public static ConnectionMultiplexer GetConnection()
    {
        if (Global.RedisConnection == null)
        {
            lock (GetConnectionLock)
            {
                if (Global.RedisConnection == null)
                {
                    Global.RedisConnection = ConnectionMultiplexer.Connect(
                        Settings.Deployment.RedisConnectionString);
                }
            }
        }
        return Global.RedisConnection;
    }
11
  • Therre are a few steps how to investigate Azure Redis timeout issues, it might help: azure.microsoft.com/en-us/blog/… Commented Aug 4, 2018 at 13:28
  • How many requests are you making per second? I wonder if there is rate throttling to prevent suspected DoS attacks. Have you tried running this on another service; aws, rackspace, local, to see if you still get timeouts? Commented Aug 19, 2018 at 20:58
  • @varlogtim can't test this on another infrastructure as it's being throw in production - dev server never seen this error (am using Redis in dev as well). Commented Aug 20, 2018 at 9:20
  • @TomGullen - Could you post the code snippet where you open the redis client? Commented Aug 22, 2018 at 8:09
  • 1
    I'm beginning to think this is just a bug with the StackExchange.Redis client. I have the same problem and haven't gotten anywhere with it. It's causing a lot of problems on our production servers. Commented Aug 22, 2018 at 18:19

5 Answers 5

11
+500

There are 3 scenarios that can cause timeouts, and it is hard to know which is in play:

  1. the library is tripping over; in particular, there are known issues relating to the TLS implementation and how we handle the read loop in the v1.* version of the library - something that we have invested a lot of time working on for v2.* (however: it is not always trivial to update to v2, especially if you're using the library as part of other code that depend on a specific version)
  2. the server/network is tripping over; this is a very real possibility - looking at "slowlog" can help if it is server-side, but I don't have any visibility of that
  3. the server and network are fine, and the library is doing what it can, but there are some huge blobs flying between client and server that are delaying other operations; this is something that I'm making changes to help identify right now, and if this shows itself to be a common problem, we'll perhaps look at making better use of concurrent connections (which doesn't increase bandwidth, but can reduce latency for blocked operations) - this would be a v2 only change, note
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your help Marc. We've managed to reduce errors from ~50-100 daily to zero by setting WorkerThreads and CompletionPortThreads to 200 (from default value which I'm assuming is 1) in Application_Start with ThreadPool.SetMinThreads(200, 200); Guessing this fits into category 2?
@TomGullen hmmm... that might be "category 4" :) btw - I added some new support yesterday to help identify / call out "category 3" - github.com/StackExchange/StackExchange.Redis/commit/…
Thanks for your help Marc and wonderful libraries as always! Happy this now appears to be resolved from what I can observe for last 2 days. Just as a side note Azure support have analysed server/network performance and confirmed there was no unusual behaviour at the time errors were being thrown.
6

Lazy Connection

As a best practice make sure you are using the following pattern to connect to the StackExchange Redis client:

private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() => {
    return ConnectionMultiplexer.Connect("cachename.redis.cache.windows.net,ssl=true,abortConnect=false,password=password");
});

public static ConnectionMultiplexer Connection {
    get {
        return lazyConnection.Value;
    }
}

If the above does not work, there are some more debugging routes described in Source 1, regarding region, bandwidth and NuGet package versions among others.

IO Threads

Another option could be to increase the minimum IO threads. It’s often recommend to set the minimum configuration value for IOCP and WORKER threads to something larger than the default value. There is no one-size-fits-all guidance on what this value should be because the right value for one application will be too high/low for another application. A good starting place is 200 or 300, then test and tweak as needed.

How to configure this setting:

  • In ASP.NET, use the minIoThreads configuration setting under the <processModel> configuration element in machine.config. According to Microsoft, you can’t change this value per site by editing your web.config (even when you could do it in the past), so the value that you choose here is the value that all your .NET sites will use. Please note that you don’t need to add every property if you have autoConfig set to false, just putting autoConfig="false" and overriding the value is enough: <processModel autoConfig="false" minIoThreads="250" />

Important Note: the value specified in this configuration element is a per-core setting. For example, if you have a 4 core machine and want your minIOThreads setting to be 200 at runtime, you would use <processModel minIoThreads="50"/>.

Sources:

  1. Microsoft Azure - Investigating timeout exceptions in StackExchange.Redis for Azure Redis Cache
  2. StackExchange.Redis

2 Comments

We've updated to this method but it's not changed anything for us, still same frequency of errors.
Why do not use ThreadPool.SetMinThreads() in Asp net (framework)?
3

My guess is that there is an issue with network stability - thus the timeouts.

Since nobody has mentioned an increase in the responseTimeout I would play around with it. The default value is 50ms which can be easily reached. I would try it around 200ms to see if that would help with teh messages.

Taken from the configuration options:

responseTimeout={int}   ResponseTimeout     SyncTimeout     Time (ms) to decide whether the socket is unhealthy

There are multiple issues opened on this on github. The one combining all is probably #871 The "network stability" / 2.0 / "pipelines" rollup issue

One more thing: did you try to play around with ConnectionMultiplexer.ConnectAsync() instead ConnectionMultiplexer.Connect()?

14.12.2021 - An Update

In stackexchange.redis v2.2.4: the following is given for

'responseTimeout' : Warning CS0618 'ConfigurationOptions.ResponseTimeout' is obsolete: 'This setting no longer has any effect, and should not be used

Update was sent by MX313

1 Comment

FYI: in stackexchange.redis v2.2.4: the following is given for 'responseTimeout' : Warning CS0618 'ConfigurationOptions.ResponseTimeout' is obsolete: 'This setting no longer has any effect, and should not be used'
2

I was struggling with this timeout error for a while, below steps resolved my problem:

first of all, I used Lazy<T> for my connection:

private static Lazy<ConnectionMultiplexer> lazyConnection =  new Lazy<ConnectionMultiplexer>(() =>
{
   return ConnectionMultiplexer.Connect(new ConfigurationOptions
          {
               EndPoints = { Url },
               AbortOnConnectFail = false,
               Ssl = UseSsl,
               Password = Password,
           });
});

public static ConnectionMultiplexer Connection => lazyConnection.Value;

Second, I updated all my async methods to sync. for example, I was using StringGetAsync so I replaced it with StringGet.

Third, I changed the minimum number of Thread:

public static async Task Main(string[] args)
{
   ...
   SetupThreadPool();
}

private static void SetupThreadPool()
{
     ThreadPool.GetMaxThreads(out var workerThreads, out var completionPortThreads);
     ThreadPool.SetMinThreads(workerThreads, completionPortThreads);
}

I tested my API by bombardier (-d 10s -c 125), and I achieved to a zero error state, below is the benchmark:

benchmark

As you can see all the requests are handled by the application successfully!

I hope it helps. Good luck.

Comments

0

Have the network traffic monitor switched on to confirm/deny the blip.have a solution to the issue but a crude one. Option 1 - try restarting the managed redis instamce in azure.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.