computer-scienceinformation-theoryentropycompression

What is the computer science definition of entropy?


I've recently started a course on data compression at my university. However, I find the use of the term "entropy" as it applies to computer science rather ambiguous. As far as I can tell, it roughly translates to the "randomness" of a system or structure.

What is the proper definition of computer science "entropy"?


Solution

  • Entropy can mean different things:

    Computing

    In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources, either pre-existing ones such as mouse movements or specially provided randomness generators.

    Information theory

    In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable

    Entropy in data compression

    Entropy in data compression may denote the randomness of the data that you are inputing to the compression algorithm. The more the entropy, the lesser the compression ratio. That means the more random the text is, the lesser you can compress it.

    Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication: treating messages to be encoded as a sequence of independent and identically-distributed random variables, Shannon's source coding theorem shows that, in the limit, the average length of the shortest possible representation to encode the messages in a given alphabet is their entropy divided by the logarithm of the number of symbols in the target alphabet.