I have a pandas datafram df
with a column A
. The values of A
are based on predictions and I've forced them to be greater or equal to 0.00000001.
Now when I run df.A.describe()
I get:
count 3.900000e+02
mean 1.047049e-05
std 7.774749e-05
min 1.000000e-08
25% 1.000000e-08
50% 1.000000e-08
75% 1.000000e-08
max 1.008428e-03+
The way I understand it, this means that at least 75% of my values for A are equal to 0.0000001.
However, when I run x = len(df.loc[df['A'] == 0.00000001])
I get x = 207
and 207/390 < 0.75.
Shouldn't I get a value for x that is greater than 292 (390*0.75 = 292.5)?
For anyone who might be running into a similar problem, I've found the answer:
There are only 207 values in my df with df.A == 0.00000001. However there are also some values which are just marginally bigger (e.g. maybe df.A == 0.0000000100000000001). Hence, even though those values are not exactly equal to 0.00000001, when I print the df or ask for df.A.describe()
they are shown as 0.00000001, since the difference is so small.