I am new to using GDAL in Python and I am trying to use it to retrieve the band data from Sentinel 2 SAFE-products. I managed to extract the band array, but I couldn't manage to get it scaled correctly.
This extracts the unscaled array of Band 4:
import gdal
product_path = "S2B_MSIL2A_20200124T101219_N0213_R022_T33UUU_20200124T121752.SAFE"
dataset = gdal.Open(product_path + "MTD_MSIL2A.xml")
bands10m_path = dataset.GetSubDatasets()[0][0]
bands10m_dataset = gdal.Open(bands10m_path)
b4_band = bands10m_dataset.GetRasterBand(1)
b4_array = b4_band.ReadArray()
So far so good, but the data type of the array is uint16
and the values range from 0
to 16896
.
b4_band.GetMinimum()
and b4_band.GetMaximum()
both return None
.
And b4_band.GetStatistics(True,True)
returns [0.0, 2829.0, 347.05880000000104, 334.8397839901348]
(as min, max, mean, stddev).
Does this help me somehow to extract the correct scale? I am clueless...
It's good to be aware that even if the scale/offset are specified in the file, GDAL won't automatically apply them.
In the case of Sentinel 2 they are not specified in the file, but in the metadata (xml). You can look in the xml-file you are using in your example with a text editor. And search for "QUANTIFICATION_VALUE" as @Val suggested.
It can also be retrieved from the metadata as parsed by GDAL. This can be done with dataset.GetMetadata()
which will return a dictonary. You can also call the gdal.Info
utilty, both methods are shown below.
import gdal
archive = 'S2A_MSIL2A_20200126T082221_N0213_R121_T34HDH_20200126T110103.zip'
# Use a dataset
ds = gdal.Open(archive)
meta = ds.GetMetadata()
ds = None
# Alternatively use gdal.Info
r = gdal.Info(archive, format='json')
meta = r['metadata']['']
You can filter out the relevant values, and convert them from string to float with something like:
{k: float(v) for k,v in meta.items() if k.endswith('_QUANTIFICATION_VALUE')}
# Result in:
{'AOT_QUANTIFICATION_VALUE': 1000.0,
'BOA_QUANTIFICATION_VALUE': 10000.0,
'WVP_QUANTIFICATION_VALUE': 1000.0}