I am trying to understand Normalize()
from Trochvision.transforms
. After using it, the mean
should be 0
and the std()
should be 1
. At least that's what the tutorials tell. But it does not work!
I tried this code:
tensor = torch.Tensor([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
normalize = transforms.Normalize(3.5, 1.7838) # used tensor.mean(), tensor.std()
norm_tensor = normalize(tensor)
The output is
norm_tensor.mean()
: tensor(-4.9671e-09)
norm_tensor.std()
: tensor(1.0000)
Shouldn't the mean be exactly 0
instead of -4.9671e-09
? I tried googling and found a few tutorials, as for example this. Strangely enough they get 3.4769e-08
but write "Yes!!!! Our Tensor is normalized to 0 mean [...]"
Do I miss some understanding or is "close to 0" and not 0
the correct normalization?
I assume this is due to floating point subtraction [*].
Consider the following:
1 - 1/3 - 2/3
output:
>>> 1 - 1/3 - 2/3
1.1102230246251565e-16
This is because floating point subtraction is inaccurate. By inaccurate, I mean it loses significant figures. When you subtract two numbers which are "close", you usually won't get exactly 0.
Since normalization requires subtraction (by the mean), you won't get exactly zero, as each element has some error from the ideal value had it been subtracted the exact ideal mean.
The calculated-mean itself is also not exactly the ideal, actual mean, though this is less likely to be the main cause of error.
The question is, why do you NEED it to be exactly zero?
You will notice 1 - 1/2 - 1/4 - 1/4 == 0
, because of how numbers are represented, though with a bit of effort, we could "break" such numbers too.
Example
>>> 1e50 - 2e49 - 4e49 -4e49
1.0384593717069655e+34
>>> 1e20 - 2e19 - 4e19 -4e19
0.0
Bottom line, don't sweat so much about getting exact results when working with floating point numbers. Since you are working with digital computers, you have to compromise on accuracy, to a degree.
There are libraries that don't have these problems, which are purely software. They will be much slower because of this.