It is said that a CRC (Cyclic Redundancy Checksum) can detect burst errors of fewer than r + 1 bits, where r is the degree of the polynomial. Furthermore, a burst of length greater than r + 1 bits is detected with probability 1 – 2-r.
Can someone guide me to the proof of the same?
Not quite true. An r-bit CRC will detect all burst patterns of length r+1 except for one pattern, the polynomial itself. See these lecture notes for the proof.
It is simply that for the message to not detect the errors, the CRC polynomial has to divide the error polynomial. If the error polynomial is r bits long, then a degree r+1 polynomial that does not have x as a factor (i.e. has a 1 term) cannot divide a degree r polynomial, and the only r+1 degree polynomial that it can divide is itself. All CRC polynomials have a 1 term.
Your other claim is a property of any r-bit hash that distributes messages with equal probability over all possible r-bit values of the hash, which a CRC does. 2-r is simply the probability that an applied error just so happens to result in the same CRC, for which there are 2r possibilities. It's the same as saying that the chance of rolling the same number you just rolled on a 6-sided die is 1/6.