[SOLVED] How does Pandas.Series.nbytes work for strings? Results don't seem to match expectations

How does Pandas.Series.nbytes work for strings? Results don't seem to match expectations

The help doc for pandas.Series.nbytes shows the following example:

s = pd.Series(['Ant', 'Bear', 'Cow'])
s

0 Ant
1 Bear
2 Cow
dtype: object

s.nbytes

24
<< end example >>

How is that 24 bytes?
I tried looking at three different encodings, none of which seems to yield that total.

print(s.str.encode('utf-8').str.len().sum())
print(s.str.encode('utf-16').str.len().sum())
print(s.str.encode('ascii').str.len().sum())

10
26
10

Solution

Pandas nbytes does not refer to the bytes required to store the string data encoded in specific formats like UTF-8, UTF-16, or ASCII. It refers to the total number of bytes consumed by the underlying array of the Series data in memory.

Pandas stores a NumPy array of pointers to these Python objects when using the object dtype.

On a 64-bit system, each pointer/reference takes 8 bytes.

3 × 8 bytes =24 bytes.

Link: nbyte source code

Link: ndarray documentation