-6

I'm writing a process monitor in Go that periodically collects information from /proc/ using multiple goroutines. Each goroutine sends errors to a buffered channel if it fails (for example, if the process exits). After all goroutines finish, I check the error channel and want to print the error and stop monitoring if any error occurred.

However, when the monitored process exits, my program just stops tracing but does not print the error message as expected. Here’s a simplified version of my controller loop:

package controller

import (
    "fmt"
    "sync"
    "time"

    "github.com/chahatsagarmain/go-ptrack/internal/process"
    "github.com/chahatsagarmain/go-ptrack/internal/ptracker"
)


func ControllerStart(pid int , _ int , p *process.Process) (error) {

    var wg sync.WaitGroup;

    errChan := make(chan error , 7);

    for {
        
        t := time.Now().Truncate(time.Second);
        fmt.Printf("\n=== Trace at %v ===\n", t)
        p.Mu.Lock();
        p.Logs[t] = process.ProcessInfo{
            PID: pid,
        }
        p.Mu.Unlock();
        var res string;
        var err error;

        wg.Add(1);
        go func(tn time.Time) {
            var resInt int;
            resInt , err = ptracker.GetStatus(pid);
            if err != nil || resInt == 0 {
                errChan <- fmt.Errorf("status 0 for process");
                return;
            }
            fmt.Printf("%v\n",resInt);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.Status = resInt;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1)
        go func(tn time.Time) {
            res , err = ptracker.GetCommandLine(pid)
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.Cmdline = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1);
        go func(tn time.Time) {
            res , err = ptracker.GetCwd(pid);
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.CWD = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1);
        go func(tn time.Time) {
            res , err = ptracker.GetExe(pid);
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.EXE = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1);
        go func(tn time.Time) {
            res , err = ptracker.GetIO(pid);
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.IO = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1);
        go func(tn time.Time) {
            res , err = ptracker.GetSysCall(pid);
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.SYSCALL = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);

        wg.Add(1);
        go func(tn time.Time) {
            res , err = ptracker.GetMem(pid);
            if err != nil {
                errChan <- err;
                return;
            }   
            fmt.Printf("%v\n",res);
            p.Mu.Lock();
            info := p.Logs[tn];
            info.MEM = res;
            p.Logs[tn] = info;
            p.Mu.Unlock();
            wg.Done();
        }(t);
        
        wg.Wait();

        p.Mu.Lock()
        fmt.Printf("=== Completed trace %d at %v ===\n", len(p.Logs), t)
        p.Mu.Unlock()

        select {
        case err := <-errChan:
            fmt.Printf("process monitoring error: %v\n", err)
            return fmt.Errorf("process monitoring error: %v", err)
        default:
            // No error, continue
        }
        
        time.Sleep(time.Second)
        
    }

}

For the following I code , once the process is terminated the first go routine ( the status one ) will send a error to error channel the logging stops but its does not seem to print the error message . Have a look at this output :-

=== Trace at 2025-05-18 01:55:58 +0530 IST ===
1
887384 110120 57329 75 0 123413 0

/usr/lib64/firefox/firefox
rchar: 84557623
wchar: 64529368
syscr: 27177
syscw: 14467
read_bytes: 0
write_bytes: 68263936
cancelled_write_bytes: 122880

7 0x7f1c73bfdd00 0x4 0x1f2 0x0 0x0 0x0 0x7ffe7cb2e708 0x7f1cb44876c2

/home/chahat
/usr/lib64/firefox/firefox
=== Completed trace 7 at 2025-05-18 01:55:58 +0530 IST ===

=== Trace at 2025-05-18 01:55:59 +0530 IST ===
1
0 0 0 0 0 0 0


tracing.....
traces generated : 8
tracing.....
traces generated : 8
tracing.....
traces generated : 8
tracing.....
traces generated : 8

here traces generated output is from other go routine that constantly monitoring the size of the Logs . The wait group appears to have stopped but I dont see any error message or return of error message .

0

2 Answers 2

2

After a goroutine sends an error to errChan, the main routine doesn't retrieve the error in time because wg.Done() is not called. As a result, wg.Wait() hangs indefinitely or the loop continues without waiting properly, which causes the error message to never be displayed.

If GetStatus returns an error, the function exits early with return, but wg.Done() is not called, causing wg.Wait() to either block indefinitely or skip execution, which makes the program hang.

wg.Add(1)
go func(tn time.Time) {
    defer wg.Done() //add this.
     //......
    p.Mu.Unlock()
}(t)
Sign up to request clarification or add additional context in comments.

Comments

-4

Using errgroups solved the issue for me, This pattern of error handling is quite complex and should be avoided especially in my case where I just required a early return.

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.