I've come across the notation recently am confused by what I see. My previous understanding is that big O is supposed to be used to denote an upper bound whereas lower or tight bound uses different notations entirely (omega and theta respectively). This understanding however doesn't help me understand what people mean by "best/worst/average case O(...)", because it seems like O(...) should always refer to the worst case.
As an example, I see claims to the effect that "hash maps are on average O(1)". As another, insertion with cuckoo hashing is sometimes said to be "expected to be O(1)" because how the probability of kicking out other keys is bounded.
In both of these cases, big O notation seems to be used to describe something other than the asymptote of the worst case complexity. For example in the worst case we should expect every insertion into a normal hash map to be a collision; "average worst case" doesn't really make sense to me. "Expected worst case" seems similarly strange. I seem to be missing something important. What am I misunderstanding?
I've tried to see if I misunderstood big O notation by looking up e.g. wikipedia, khan academy, and similar resources. They seem to confirm that big O is used to denote worst case scenario only.
This is a common source of confusion when learning complexity analysis, and many resources do a poor job of explaining it. The answer is that the big-O notation is not specific to worst case complexity. big-O, big-Omega and big-Theta are all tools for analyzing mathematical functions, no matter what those functions describe (they don't actually need to describe anything at all).
In computer science, it just so happens that we most often apply the big-O notation to functions that describe the worst case runtime complexity of an algorithm - hence the confusion. But as you have discovered, it is also valid to apply it to functions that describe average case, worst case, or anything else. However, in most situations, big-O is usually the most useful notation to apply to the worst case, because the resulting expression also holds for the algorithm's complexity in all cases: if the worst case of some algorithm is O(n2), that means that it can never be worse than that, so the algorithm's complexity in all cases is O(n2): that is, no matter the input, the complexity will be described by some function that does not grow faster than n2.
Even so, it is perfectly valid to apply the other functions to worst case - for example, an algorithm whose complexity function in the worst case is exactly 3n2 + 4n + 8 can also be said to be Omega(n2) in the worst case. The reason this is not usually done is that it produces a less useful statement, since the statement allows for the possibility that the complexity is arbitrarily high, e.g. 2n. There is, however, a very interesting application of big-Omega to worst case complexity: to express that comparison-based sorting is Omega(n lg n) in the worst case! This means that it is impossible to create a sorting algorithm where the central operation is to compare pairs of numbers and have it always be faster than n lg n - there will always be some inputs where the algorithm will require n lg n time or more. The algorithm might be faster in some cases, so it is not true that the best case for comparison-based sorting is Omega(n lg n), but the worst case is.