asp.net-mvc high-availability uptime downtime service-level-agreement

How to calculate application availability (SLA)

I have standard ASP.NET MVC project and I need to calculate application availability to find out our SLA level. So, I need to get something like this for our web application.

Information from my hosting provider

System Availability: 99.9860%
Total Uptime: 30d 10h:22m:44s
Total Downtime: 0d 0h:6m:9s
Total Reboots: 3
Mean Time Between Reboots: 10.15 days

But I need to calculate availability for application. So, the question is

How to calculate ASP.NET MVC application availability in proper way?

Maybe someone has already implemented that, or any suggestion how to do that, any help will be appreciated.

Where to start?

The first point what I think that is Application Insights and availability test. The problem is that the minimum value of test frequency is 5 minutes. I need more precise measurements.

Next, create a some tool that will call my app every second and collect information. Result: a very large number of requests.

Also, get some perf counters from IIS or something like that. Need to investigate if it is possible.

I know that the question possible is too broad, but I didn't find any info about implementation of application availability. What do you think about that?

Solution

It would take to long if I was to explain all parts that can be done, so I'll keep it short.

Usually you define all these details in a Service Level Agreement where you also define the availability target (i.e. 99 %) that also include planned downtime. A 99 % availability target is to have the app running and its functionality as described in the document with at most approx. 87.6 h per year. Here is a SLA uptime calculator.

The normal interval is 5 minutes as you say, but it you can prove by using an external site / service that the suppliers are not meeting the requirements, you calculate your loss (revenue loss, labor costs etc) and claim the money from them. You already have a Business Impact Analysis (BIA) I guess otherwise you should do it.

Ok, now to the programming / DevOps part. I usually develop applications / services with this in mind and report its status to a third party service like NewRelic, Uptrends or similar. As an example I also use a self-made service for this because accurate requirements for delivering data at least once a second with a hard deadline. In my solution I use WebSockets to send data in both directions following a schedule, event or when needed. A benefit with that is that you can send status (good or bad) let say every 500 ms and you will know within one second if the app has failed (≈ 499 ms + 500 ms).

Using a service like this you can measure the uptime, custom events of interest and possible errors within a second and a ton of other metrics. Usually within 5-100 ms but WCET/WCRT is hard to estimate.

To answer your question, you cannot calculate application availability with so few measure points, once every 5 min is covering approx. 12 seconds per hour and you cannot have any reliable calculation from that. You can assume everything was ok between the measure points but that is called guessing. I have made implementations that have 14 400 measure points per hour in order to provide 500 ms accuracy (Banks).

I hope you got an answer that helps you with your problem.