pythongdalsentinel2

How do I get the correct scale of a band array in GDAL Python?


I am new to using GDAL in Python and I am trying to use it to retrieve the band data from Sentinel 2 SAFE-products. I managed to extract the band array, but I couldn't manage to get it scaled correctly.

This extracts the unscaled array of Band 4:

import gdal

product_path = "S2B_MSIL2A_20200124T101219_N0213_R022_T33UUU_20200124T121752.SAFE"
dataset = gdal.Open(product_path + "MTD_MSIL2A.xml")
bands10m_path = dataset.GetSubDatasets()[0][0]
bands10m_dataset = gdal.Open(bands10m_path)
b4_band = bands10m_dataset.GetRasterBand(1)
b4_array = b4_band.ReadArray()

So far so good, but the data type of the array is uint16 and the values range from 0 to 16896.

b4_band.GetMinimum() and b4_band.GetMaximum() both return None.

And b4_band.GetStatistics(True,True) returns [0.0, 2829.0, 347.05880000000104, 334.8397839901348] (as min, max, mean, stddev).

Does this help me somehow to extract the correct scale? I am clueless...


Solution

  • It's good to be aware that even if the scale/offset are specified in the file, GDAL won't automatically apply them.

    In the case of Sentinel 2 they are not specified in the file, but in the metadata (xml). You can look in the xml-file you are using in your example with a text editor. And search for "QUANTIFICATION_VALUE" as @Val suggested.

    It can also be retrieved from the metadata as parsed by GDAL. This can be done with dataset.GetMetadata() which will return a dictonary. You can also call the gdal.Info utilty, both methods are shown below.

    import gdal
    
    archive = 'S2A_MSIL2A_20200126T082221_N0213_R121_T34HDH_20200126T110103.zip'
    
    # Use a dataset
    ds = gdal.Open(archive)
    meta = ds.GetMetadata()
    ds = None
    
    # Alternatively use gdal.Info
    r = gdal.Info(archive, format='json')
    meta = r['metadata']['']
    

    You can filter out the relevant values, and convert them from string to float with something like:

    {k: float(v) for k,v in meta.items() if k.endswith('_QUANTIFICATION_VALUE')}
    
    # Result in:
    {'AOT_QUANTIFICATION_VALUE': 1000.0,
     'BOA_QUANTIFICATION_VALUE': 10000.0,
     'WVP_QUANTIFICATION_VALUE': 1000.0}