I am having confusion on the difference between WaitGroup and ErrGroup when it comes to context cancellation. The reason behind is that I keep on reading in documentations and articles the following. For example from here:
ErrGroup: Allows graceful cancellation of goroutines when an error occurs with context.
WaitGroup: No built-in cancellation; you must manually stop goroutines.
And yet I haven't been able to find an example that clearly makes me understand such difference. I will try to write some code.
ErrGroup usage:
func run(ctx context.Context) error {
grpcServer := grpc.NewServer()
service := service.NewSayHelloService()
port := ":50051"
group, ctx := errgroup.WithContext(ctx)
group.Go(func() error {
<-ctx.Done()
logger.Info().Msg("context has been cancelled")
grpcServer.GracefulStop()
return ctx.Err()
})
group.Go(func() error {
lis, err := net.Listen("tcp", port)
if err != nil {
logger.Error().Err(err).Msgf("failed to listen on port %s", port)
return err
}
logger.Info().Msg("starting grpc server")
err = grpcServer.Serve(lis)
if err != nil {
logger.Error().Err(err).Msg("failed to start server")
return err
}
return nil
})
// Waiting on both goroutines to finish or error out before returning the control to main() function.
err := group.Wait()
if err != nil {
logger.Error().Err(err).Msg("server failing on waiting goroutines")
}
return err
}
For my understanding, since I am using ErrGroup, if the goroutine spinning up the server fails for some reason, the context will be cancelled and its cancellation propagated automatically to the other goroutines listening.
Since the other goroutine is listening through <-ctx.Done()
, it will unblock and execute its code.
If by any chance such goroutine was not listening through <-ctx.Done()
but was simply doing its own stuff, it would continue doing its stuff until done or erroring out.
group.Wait()
will return only the error from the first goroutine erroring out.
Now let's go to WaitGroup and try to make an example that I took from this article:
func run(ctx context.Context) error {
// Retrieve the logger from the context
logger := zerolog.Ctx(ctx)
// Create new http.ServeMux,
// register the routes and handlers
// and initialize the server
router := web.NewRouter(logger)
httpServer := &http.Server{
Addr: net.JoinHostPort(config.Host, config.Port),
Handler: router,
}
// Start the server in a goroutine
go func() {
logger.Info().Msgf("listening on %s\n", httpServer.Addr)
if err := httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
logger.Error().Err(err).Msg("error listening and serving")
}
}()
// Wait for the context to be cancelled
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
<-ctx.Done()
// Create a new context to wait for the shutdown with a timeout
shutdownCtx := context.Background()
shutdownCtx, cancel := context.WithTimeout(shutdownCtx, config.GracefulShutdownTimeout*time.Second)
defer cancel()
// Shutdown the server
if err := httpServer.Shutdown(shutdownCtx); err != nil {
logger.Error().Err(err).Msg("error shutting down http server")
}
}()
wg.Wait()
return nil
}
What exactly is happening here and why would this be different from the ErrGroup example?
From my understanding, there are still two parallel goroutines, one spinning up the server and the other listening on the cancellation of the context, ready to gracefully shutting down the server.
Why with WaitGroup wouldn't be possible to gracefully cancel goroutines as the article I mentioned at the beginning of this post says? Because this is what I understand it's actually happening. Where is it that we are "manually" stopping a goroutine because WaitGroup doesn't automatically support it? Thanks in advance!
If you open the errgroup package
type Group struct {
cancel func(error)
wg sync.WaitGroup
sem chan token
errOnce sync.Once
err error
}
You can see that waitgroup is used internally as wg sync.WaitGroup
. That means errgroup is made to wrap waitgroup to make it easier to use.
I think that by opening the errgroup package, I can solve my curiosity with examples.