sre

How google runs production systems - what's really the "50% time for project work" for SRE?


Quote: "SREs at 50% of their time. Their remaining time should be spent using their coding skills on project work." (page 7)"

I'm reading this book, and realy can't understand.

What is "project work"?

It is production code or ansible yaml?


Solution

  • SRE @Google here.

    This means - a SRE should use at-least 50% of his time on project work. In other words, a SRE should only use 50% of his time at-most on operational work. If operational works consume more than 50%, its a signal that the associated production stack has room for automation by undertaking more projects.

    Operational work includes handling interruptions/alerts on production, managing service provisioning or any toilsome production work. Project work includes developing monitoring system, creating CI/CD pipeline or deploying next generation global load balancers, reverse proxy servers etc.

    This is a key SRE philosophy at Google that each team should spend a maximum of 50% of its time on operational work. As a service grows, it's necessary to undertake projects to stop operational needs from growing disproportionately in the future. Project work aims to address problems early so that they don't lead to operational work taking up > 50% of SRE time.