goconcurrencygoroutinego-context

Using context with cancel, Go routine doesn't terminate


I'm new to Go and concurrency in Go. I'm trying to use a Go context to cancel a set of Go routines once I find a member with a given ID.

A Group stores a list of Clients, and each Client has a list of Members. I want to search in parallel all the Clients and all their Members to find a Member with a given ID. Once this Member is found, I want to cancel all the other Go routines and return the discovered Member.

I've tried the following implementation, using a context.WithCancel and a WaitGroup.

This doesn't work however, and hangs indefinitely, never getting past the line waitGroup.Wait(), but I'm not sure why exactly.

func (group *Group) MemberWithID(ID string) (*models.Member, error) {
    found := make(chan *models.Member)
    ctx := context.Background()
    ctx, cancel := context.WithCancel(ctx)
    defer cancel()
    var waitGroup sync.WaitGroup

    for _, client := range group.Clients {
        waitGroup.Add(1)

        go func(clientToQuery Client) {
            defer waitGroup.Done()

            select {
            case <-ctx.Done():
                return
            default:
            }

            member, _ := client.ClientMemberWithID(ID)
            if member != nil {
                found <- member
                cancel()
                return
            }

        } (client)

    }

    waitGroup.Wait()

    if len(found) > 0 {
        return <-found, nil
    }

    return nil, fmt.Errorf("no member found with given id")
}

Solution

  • found is an unbuffered channel, so sending on it blocks until there is someone ready to receive from it.

    Your main() function would be the one to receive from it, but only after waitGroup.Wait() returns. But that will block until all launched goroutines call waitGroup.Done(). But that won't happen until they return, which won't happen until they can send on found. It's a deadlock.

    If you change found to be buffered, that will allow sending values on it even if main() is not ready to receive from it (as many values as big the buffer is).

    But you should receive from found before waitGroup.Wait() returns.

    Another solution is to use a buffer of 1 for found, and use non-blocking send on found. That way the first (fastest) goroutine will be able to send the result, and the rest (given we're using non-blocking send) will simply skip sending.

    Also note that it should be the main() that calls cancel(), not each launched goroutines individually.