pythontimestampdatetime64

Why does my date column change when I convert to an ndarray


below is my dataframe

from pandas import Timestamp
df = pd.DataFrame({'Year': [Timestamp('2023-03-14 00:00:00'),Timestamp('2063-03-15 00:00:00'),Timestamp('2043-03-21 00:00:00'),Timestamp('2053-10-09 00:00:00')],
                    'offset' : [1, 9, 8, 1]
})

when I convert my 'Year" column to list(), they are saved as time stamp

>>> df['Year'].to_list()
[Timestamp('2023-03-14 00:00:00'),
 Timestamp('2063-03-15 00:00:00'),
 Timestamp('2043-03-21 00:00:00'),
 Timestamp('2053-10-09 00:00:00')]

However, when I convert to values they are saved as datetime64

>>> df['Year'].values
array(['2023-03-14T00:00:00.000000000', '2063-03-15T00:00:00.000000000',
       '2043-03-21T00:00:00.000000000', '2053-10-09T00:00:00.000000000'],
      dtype='datetime64[ns]')

How do I get my array in Timestamp itself (instead of datetime64 format)?


Solution

  • It's converted to a datetime64 because numpy arrays only hold certain datatypes. Timestamp objects are not one of them. This has to do with how numpy arrays are stored as one contiguous block in memory, and handled by numpy's C-backend.

    Starting v1.7, core datatypes datetime64 and timedelta64 were added to support these functionalities, but they still store data in memory as integers citation needed

    You can create a numpy array of Timestamp objects with np.array(df.Year.to_list()), but that will result in an array having dtype=object

    array([Timestamp('2023-03-14 00:00:00'), Timestamp('2063-03-15 00:00:00'),
           Timestamp('2043-03-21 00:00:00'), Timestamp('2053-10-09 00:00:00')],
          dtype=object)
    

    For more information on what this entails, see this answer

    Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).