pythonpython-3.xpandas

what are all the dtypes that pandas recognizes?


For pandas, would anyone know, if any datatype apart from

(i) float64, int64 (and other variants of np.number like float32, int8 etc.)

(ii) bool

(iii) datetime64, timedelta64

such as string columns, always have a dtype of object ?

Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas does not make it's dtype an object?


Solution

  • EDIT Feb 2020 following pandas 1.0.0 release

    Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information.

    The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.

    By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.

    Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of pd.NA to represent scalar missing values (rather than the previous values of np.nan, pd.NaT or None, depending on usage).

    Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.

    1) Time zone handling

    Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).

    Data type: DatetimeTZDtype

    Scalar: Timestamp

    Array: arrays.DatetimeArray

    String Aliases: 'datetime64[ns, ]'

    2) Categorical data

    Kind of data: Categorical

    Data type: CategoricalDtype

    Scalar: (none)

    Array: Categorical

    String Aliases: 'category'

    3) Time span representation

    Kind of data: period (time spans)

    Data type: PeriodDtype

    Scalar: Period

    Array: arrays.PeriodArray

    String Aliases: 'period[]', 'Period[]'

    4) Sparse data structures

    Kind of data: sparse

    Data type: SparseDtype

    Scalar: (none)

    Array: arrays.SparseArray

    String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'

    5) IntervalIndex

    Kind of data: intervals

    Data type: IntervalDtype

    Scalar: Interval

    Array: arrays.IntervalArray

    String Aliases: 'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'

    6) Nullable integer data type

    Kind of data: nullable integer

    Data type: Int64Dtype, ...

    Scalar: (none)

    Array: arrays.IntegerArray

    String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'

    7) Working with text data

    Kind of data: Strings

    Data type: StringDtype

    Scalar: str

    Array: arrays.StringArray

    String Aliases: 'string'

    8) Boolean data with missing values

    Kind of data: Boolean (with NA)

    Data type: BooleanDtype

    Scalar: bool

    Array: arrays.BooleanArray

    String Aliases: 'boolean'