video-processinghdr

What is the difference between Static HDR and dynamic HDR?


HDR is a high dynamic range which is widely used in video devices to have better viewing experience. What is the difference between static HDR and dynamic HDR?


Solution

  • Dynamic HDR can achieve higher HDR media quality across a variety of displays.

    The following presentation: SMPTE ST 2094 and Dynamic Metadata summarizes the subject of Dynamic Metadata:

    Dynamic Metadata for Color Volume Transforms (DMCVT)

    It all starts with digital Quantization.
    Assume you need to approximate the numbers between 0 and 1,000,000 using only 1000 possible values.
    Your first option is using uniform quantification:
    Values in range [0, 999] are mapped to 0, range [1000, 1999] are mapped to 1, [2000, 2999] are mapped to 2, and so on...

    When you need to restore the original data, you can't restore it accurately, so you need to get the value with minimal average error.
    0 is mapped to 500 (to the center of the range [0, 999]).
    1 is mapped to 1500 (to the center of the range [1000, 1999]).
    When you restore the quntized data, you are loosing lots of information.
    The information you loose is called "Quantization error".

    The common HDR video applies 10 bits per color component (10 bits for Y component, 10 bits for U and 10 bits for V). Or 10 bits for red, 10 for green and 10 for blue in RGB color space.
    10 bits can store 1024 possible values (values in range [0, 1023]).

    Assume you have a very good monitor that can display 1,000,001 different brightness levels (0 is darkest and 1000000 is the brightest).
    Now you need to quantize the 1,000,001 levels to 1024 values.

    Since the response of the human visual system to brightness level is not linear, the uniform quantization illustrated above, is sub-optimal.

    The quantization to 10 bits is performed after applying a gamma function.
    Example for gamma function: divide each value by 1000000 (new range is [0,1]), compute square root of each value, and multiply the result by 1000000.
    Apply the quantization after the gamma function.
    The result is: keeping more accuracy on the darker values, on expanse of the brighter values.
    The monitor do the opposite operation (de-quantization, and inverse gamma).
    Preforming the quantization after applying gamma function results a better quality for the human visual system.

    In reality, square root is not the best gamma function.
    There are three types of standard HDR static gamma functions:

    Can we do better?
    What if we could select the optimal "gamma functions" for each video frame?

    Example for Dynamic Metadata:
    Consider the case where all the brightness levels in the image are in range [500000, 501000]:
    Now we can map all the levels to 10 bits, without any quantization.
    All we need to do is send 500000 as minimum level, and 501000 as minimum level in the image metadata.
    Instead of quantization, we can just subtract 500000 from each value.
    The monitor that receives the image, reads the metadata, and knows to add 500000 to each value - so there is a perfect data reconstruction (no quantization errors).
    Assume the levels of the next image is in range 400000 to 401000, so we need to adjust the metadata (dynamically).


    In case you are still reading...


    I am really not sure that the main advantage of DMCVT is reducing the quantization errors.
    (It was just simpler to give an example of reducing the quantization errors).

    Reducing the conversion errors:
    Accurate conversion from the digital representation of the input (e.g BT.2100 to the optimal pixel value of the display (like the RGB voltage of the pixel) requires "heavy math".
    The conversion process is called Color Volume Transformation.
    Displays replaces the heavy computation with mathematical approximations (using look up tables and interpolations [I suppose]).

    Another advantage of DMCVT, is moving the "heavy math" from the display to the video post-production process.
    The computational resources in the video post-production stage are in order of magnitudes higher than the display resources.
    In the post-production stage, the computers can calculate metadata that helps the display performing much more accurate Color Volume Transformation (with less computational resources), and reduce the conversion errors considerably.


    Example from the presentation:
    Dynamic Tone Mapping


    Why are "HDR static gamma functions" called static?
    As opposed to DMCVT, static gamma functions are fixed across the entire movie, or fixed (pre-defined) across the entire "system".
    For example: Most PC systems (PC and monitors) are using sRGB color space (not HDR).
    The sRGB standard uses the following fixed gamma function:
    sRGB gamma function.
    Both the PC system and the display know in advance that they are working in sRGB standard, and know that this is the gamma function that is used (without adding any metadata, or adding one byte of metadata that marks the video data as sRGB).