I am reading an article on amortized analysis of algorithms. The following is a text snippet.
Amortized analysis is similar to average-case analysis in that it is concerned with the cost averaged over a sequence of operations. However, average case analysis relies on probabilistic assumptions about the data structures and operations in order to compute an expected running time of an algorithm. Its applicability is therefore dependent on certain assumptions about the probability distribution of algorithm inputs.
An average case bound does not preclude the possibility that one will get “unlucky” and encounter an input that requires more-than-expected time even if the assumptions for probability distribution of inputs are valid.
My questions about above text snippet are:
In the first paragraph, how does average-case analysis “rely on probabilistic assumptions about data structures and operations?” I know average-case analysis depends on probability of input, but what does the above statement mean?
What does the author mean in the second paragraph that average case is not valid even if the input distribution is valid?
Thanks!
To get the average-case time complexity, you need to make assumptions about what the "average case" is. If inputs are strings, what's the "average string"? Does only length matter? If so, what is the average length of strings I will get? If not, what is the average character(s) in these strings? It becomes difficult to answer these questions definitively if the strings are, for instance, last names. What is the average last name?
In most interesting statistical samples, the maximum value is greater than the mean. This means that your average case analysis will sometimes underestimate the time/resources needed for certain inputs (which are problematic). If you think about it, for a symmetrical PDF, average case analysis should underestimate as much as it overestimates. Worst case analysis, OTOH, considers only the most problematic case(s), and so is guaranteed to overestimate.