stackdrivergoogle-cloud-debugger

Google Cloud Stackdriver Debugger - production debugging?


How does stackdriver debug application which are in production?

Will the server be down during this period? How would the latency be?

Is there a way we can debug to an incident that's 'already happened'? e.g. I have an application running in production. And there was an issue - say, I wasn't able to add an item to the shopping cart, or some other issue. Can we go back and debug the issue? Or does it debug the live application?


Solution

  • Stackdriver Debugger is an always on, whole service debugger. You don't debug just a single server/VM but rather all of your servers belonging to the same service, at the same time. It captures the call stack and variables from a single server when the condition hits and then cancels the snapshot from all other servers.

    Stackdriver Debugger agent doesn't stop the process, but briefly pauses the thread hitting the snapshot line and condition. Usually the thread is paused for about 3ms to capture ~64K of information, your time may vary.

    Stakdriver Debugger agents are written from scratch with the purpose of optimizing for application latency. They use all sort of tricks to avoid pausing the running thread/server. (e.g., serialization of the data happens after the thread is released)

    Stackdriver Debugger is a realtime interactive debugger. There is really now way to debug something that happen in the past. However, since it's a production debugger you can set your snapshot location in production and wait of the event to happen again.

    One other feature of Stackdriver Debugger that might find useful are logpoints. These are log statement that you can insert dynamically to your application with a specific case/condition in mind. You don't have to make code changes or re-deploy your service. see the blogpost.