1

So I'm trying to use unix sockets with fluentd for a logging task and find that randomly, once in a while the error

dial: {socket_name} resource temporarily unavailable

Any ideas as to why this might be occurring?

I tried adding "retry" logic, to reduce the error, but it still occurs at times.

Also, for fluntd we are using the default config for unix sockets communication

func connect() {

var connection net.Conn
var err error
for i := 0; i < retry_count; i++ {
    connection, err = net.Dial("unix", path_to_socket)
    if err == nil {
        break

    }
    time.Sleep(time.Duration(math.Exp2(float64(retry_count))) * time.Millisecond)
}
if err != nil {
    fmt.Println(err)

} else {
        connection.Write(data_to_send_socket)

    }
     defer connection.Close()
}
2
  • 2
    Presumably that Dial function is calling connect(2), which can return EAGAIN if the routing cache is full. You're probably overloading the socket stack. Try switching to exponential backoff (i.e. double the retry_duration each time you retry). Commented May 14, 2015 at 3:05
  • 2
    Might the server process's listen queue be full? Commented May 14, 2015 at 4:41

2 Answers 2

2

Go creates its sockets in non-blocking mode, which means that certain system calls that would usually block instead. In most cases it transparently handles the EAGAIN error (what is indicated by the "resource temporarily unavailable" message) by waiting until the socket is ready to read/write. It doesn't seem to have this logic for the connect call in Dial though.

It is possible for connect to return EAGAIN when connecting to a UNIX domain socket if its listen queue has filled up. This will happen if clients are connecting to it faster than it is accepting them. Go should probably wait on the socket until it becomes connectable in this case and retry similar to what it does for Read/Write, but it doesn't seem to have that logic.

So your best bet would be to handle the error by waiting and retrying the Dial call. That, or work out why your server isn't accepting connections in a timely manner.

Sign up to request clarification or add additional context in comments.

Comments

0

For the exponential backoff you can use this library: github.com/cenkalti/backoff. I think the way you have it now it always sleeps for the same amount of time.

For the network error you need to check if it's a temporary error or not. If it is then retry:

type TemporaryError interface {
    Temporary() bool
}

func dial() (conn net.Conn, err error) {
    backoff.Retry(func() error {
        conn, err = net.Dial("unix", "/tmp/ex.socket")
        if err != nil {
            // if this is a temporary error, then retry
            if terr, ok := err.(TemporaryError); ok && terr.Temporary() {
                return err
            }
        }
        // if we were successful, or there was a non-temporary error, fail
        return nil
    }, backoff.NewExponentialBackOff())
    return
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.