The help doc for pandas.Series.nbytes shows the following example:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s
0 Ant
1 Bear
2 Cow
dtype: object
s.nbytes
24
<< end example >>
How is that 24 bytes?
I tried looking at three different encodings, none of which seems to yield that total.
print(s.str.encode('utf-8').str.len().sum())
print(s.str.encode('utf-16').str.len().sum())
print(s.str.encode('ascii').str.len().sum())
10
26
10
Pandas nbytes
does not refer to the bytes required to store the string data encoded in specific formats like UTF-8
, UTF-16
, or ASCII
. It refers to the total number of bytes consumed by the underlying array of the Series data in memory.
Pandas stores a NumPy array of pointers to these Python objects when using the object dtype
.
On a 64-bit system, each pointer/reference takes 8 bytes.
3 × 8 bytes =24 bytes.