For pandas, would anyone know, if any datatype apart from
(i) float64
, int64
(and other variants of np.number
like float32
, int8
etc.)
(ii) bool
(iii) datetime64
, timedelta64
such as string columns, always have a dtype
of object
?
Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas
does not make it's dtype
an object
?
EDIT Feb 2020 following pandas 1.0.0 release
Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information.
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.
By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.
Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of
pd.NA
to represent scalar missing values (rather than the previous values ofnp.nan
,pd.NaT
orNone
, depending on usage).
Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.
Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).
Data type: DatetimeTZDtype
Scalar: Timestamp
Array: arrays.DatetimeArray
String Aliases: 'datetime64[ns, ]'
Kind of data: Categorical
Data type: CategoricalDtype
Scalar: (none)
Array: Categorical
String Aliases: 'category'
Kind of data: period (time spans)
Data type: PeriodDtype
Scalar: Period
Array: arrays.PeriodArray
String Aliases: 'period[]', 'Period[]'
Kind of data: sparse
Data type: SparseDtype
Scalar: (none)
Array: arrays.SparseArray
String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'
Kind of data: intervals
Data type: IntervalDtype
Scalar: Interval
Array: arrays.IntervalArray
String Aliases: 'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'
Kind of data: nullable integer
Data type: Int64Dtype, ...
Scalar: (none)
Array: arrays.IntegerArray
String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
Kind of data: Strings
Data type: StringDtype
Scalar: str
Array: arrays.StringArray
String Aliases: 'string'
8) Boolean data with missing values
Kind of data: Boolean (with NA)
Data type: BooleanDtype
Scalar: bool
Array: arrays.BooleanArray
String Aliases: 'boolean'