fastapiazure-application-insightsazure-monitoropencensus

Trace failed fastapi requests with opencensus


I'm using opencensus-python to track requests to my python fastapi application running in production, and exporting the information to Azure AppInsights using the opencensus exporters. I followed the Azure Monitor docs and was helped out by this issue post which puts all the necessary bits in a useful middleware class.

Only to realize later on that requests that caused the app to crash, i.e. unhandled 5xx type errors, would never be tracked, since the call to execute the logic for the request fails before any tracing happens. The Azure Monitor docs only talk about tracking exceptions through the logs, but this is separate from the tracing of requests, unless I'm missing something. I certainly wouldn't want to lose out on failed requests, these are super important to track! I'm accustomed to using the "Failures" tab in app insights to monitor any failing requests.

I figured the way to track these requests is to explicitly handle any internal exceptions using try/catch and export the trace, manually setting the result code to 500. But I found it really odd that there seems to be no documentation of this, on opencensus or Azure.

The problem I have now is: this middleware function is expected to pass back a "response" object, which fastapi then uses as a callable object down the line (not sure why) - but in the case where I caught an exception in the underlying processing (i.e. at await call_next(request)) I don't have any response to return. I tried returning None but this just causes further exceptions down the line (None is not callable).

Here is my version of the middleware class - its very similar to the issue post I linked, but I'm try/catching over await call_next(request) rather than just letting it fail unhanded. Scroll down to the final 5 lines of code to see that.

import logging

from fastapi import Request
from opencensus.trace import (
    attributes_helper,
    execution_context,
    samplers,
)
from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.trace import span as span_module
from opencensus.trace import tracer as tracer_module
from opencensus.trace import utils
from opencensus.trace.propagation import trace_context_http_header_format
from opencensus.ext.azure.log_exporter import AzureLogHandler
from starlette.types import ASGIApp

from src.settings import settings

HTTP_HOST = attributes_helper.COMMON_ATTRIBUTES["HTTP_HOST"]
HTTP_METHOD = attributes_helper.COMMON_ATTRIBUTES["HTTP_METHOD"]
HTTP_PATH = attributes_helper.COMMON_ATTRIBUTES["HTTP_PATH"]
HTTP_ROUTE = attributes_helper.COMMON_ATTRIBUTES["HTTP_ROUTE"]
HTTP_URL = attributes_helper.COMMON_ATTRIBUTES["HTTP_URL"]
HTTP_STATUS_CODE = attributes_helper.COMMON_ATTRIBUTES["HTTP_STATUS_CODE"]

module_logger = logging.getLogger(__name__)
module_logger.addHandler(AzureLogHandler(
    connection_string=settings.appinsights_connection_string
))

class AppInsightsMiddleware:
    """
    Middleware class to handle tracing of fastapi requests and exporting the data to AppInsights. 
    
    Most of the code here is copied from a github issue: https://github.com/census-instrumentation/opencensus-python/issues/1020
    """
    def __init__(
        self,
        app: ASGIApp,
        excludelist_paths=None,
        excludelist_hostnames=None,
        sampler=None,
        exporter=None,
        propagator=None,
    ) -> None:
        self.app = app
        self.excludelist_paths = excludelist_paths
        self.excludelist_hostnames = excludelist_hostnames
        self.sampler = sampler or samplers.AlwaysOnSampler()
        self.propagator = (
            propagator or trace_context_http_header_format.TraceContextPropagator()
        )
        self.exporter = exporter or AzureExporter(
            connection_string=settings.appinsights_connection_string
        )

    async def __call__(self, request: Request, call_next):

        # Do not trace if the url is in the exclude list
        if utils.disable_tracing_url(str(request.url), self.excludelist_paths):
            return await call_next(request)

        try:
            span_context = self.propagator.from_headers(request.headers)

            tracer = tracer_module.Tracer(
                span_context=span_context,
                sampler=self.sampler,
                exporter=self.exporter,
                propagator=self.propagator,
            )
        except Exception:
            module_logger.error("Failed to trace request", exc_info=True)
            return await call_next(request)

        try:
            span = tracer.start_span()
            span.span_kind = span_module.SpanKind.SERVER
            span.name = "[{}]{}".format(request.method, request.url)
            tracer.add_attribute_to_current_span(HTTP_HOST, request.url.hostname)
            tracer.add_attribute_to_current_span(HTTP_METHOD, request.method)
            tracer.add_attribute_to_current_span(HTTP_PATH, request.url.path)
            tracer.add_attribute_to_current_span(HTTP_URL, str(request.url))
            execution_context.set_opencensus_attr(
                "excludelist_hostnames", self.excludelist_hostnames
            )
        except Exception:  # pragma: NO COVER
            module_logger.error("Failed to trace request", exc_info=True)

        try:
            response = await call_next(request)
            tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
            tracer.end_span()
            return response
        # Explicitly handle any internal exception here, and set status code to 500 
        except Exception as exception:
            module_logger.exception(exception)
            tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, 500)
            tracer.end_span()
            return None

I then register this middleware class in main.py like so:

app.middleware("http")(AppInsightsMiddleware(app, sampler=samplers.AlwaysOnSampler()))

Solution

  • Explicitly handle any exception that may occur in processing the API request. That allows you to finish tracing the request, setting the status code to 500. You can then re-throw the exception to ensure that the application raises the expected exception.

            try:
                response = await call_next(request)
                tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
                tracer.end_span()
                return response
            # Explicitly handle any internal exception here, and set status code to 500 
            except Exception as exception:
                module_logger.exception(exception)
                tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, 500)
                tracer.end_span()
                raise exception