I have a Pandas dataframe, named "impression_data," which includes a column called "site.id," like this:
>>> impression_data['site.id']
0 62
1 189
2 191
3 62
...
Each item in this column has the datatype numpy.int64, like this:
>>> for i in impression_data['site.id']:
print type(i)
<type 'numpy.int64'>
<type 'numpy.int64'>
<type 'numpy.int64'>
...
And as expected, membership testing works well so long as I test integers:
>>> 62 in impression_data['site.id']
True
But here's the unexpected result: I was under the impression that a column of np.int64
's ought not to include any decimal values whatsoever. Apparently I'm wrong. What's going on here?
>>> 62.5 in impression_data['site.id']
True
Edit 1: All values in the column ought to be integers by construction. For completeness, I have also performed the following casting operation and encountered no errors:
impression_data['site.id'] = impression_data['site.id'].astype('int')
As per @BremBam's suggestions in the comments, I tried
impression_data['site.id'].map(type).unique()
which produces
[<type 'numpy.int64'>]
A minimal example and the real datafile I'm working with are here https://dl.dropboxusercontent.com/u/28347262/SE%20Pandas%20Int64%20Membership%20Testing/cm_impression.csv
and here
This is a bug in pandas. The value is cast to the type of the index before the containment test is done, so 62.5
is converted to 62
. (Note that in
for a Series checks whether the value is in the index, not the values.)
I believe you can get what you want by doing 62.5 in impression_data.values
.