[SOLVED] Kubernetes Google Server-side tagging memory leak troubleshooting

Kubernetes Google Server-side tagging memory leak troubleshooting

Recently, we deployed a Kubernetes cluster of the server-side tagging solution provided by Google. They provide a docker image that we are using in our managed Kubernetes cluster that is deployed on DigitalOcean.

We started observing that the memory usage is gradually increasing overtime. By analysing this problem further and running some tests we suspect that the containers running the Docker image listed above are causing this problem. Below you can see that restarting the containers resulted in a significant drop in the memory usage. The memory usage given by kubectl top pods -A demonstrated the same drop.

We have already set the resource limits for deployments running this image to prevent it from growing till the nodes run out of resources. However, we did not expect this behavior from this image. Does anyone know if this is a common problem with this image? Could there be any other reasons such as cluster settings causing this problem? And what would be the best practise to handle this potential memory leak? One of the solutions we are currently thinking of is scheduling a rollout restart every 24 hours.

Solution

The problem has resolved itself. The problem seemed to occur along with the June 6/7, 2023 release of Google Tag Manager and was resolved with the June 13, 2023 release.