I am trying to convert a column of time stamps that are currently of type object to proper datetime. When using pd.to_datetime() on the column I got this error
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2821-11-10, at position 1818
I just want to find that value and correct it ie. should be 2021-11-10. But I can't find a way to search for that entry. The column is too large to physically look at every entry. Any tips?
Convert the dates to datatime and coerce errors. Then filter rows where an error occurred to either be fixed or dropped or further manipulated.
import pandas as pd
data = {
'Event': ['Meeting A', 'Meeting B', 'Presentation C', 'Workshop D'],
'Date': ['2023-01-15', '2923-01-18', '2023-02-01', '2023-02-05']
}
df = pd.DataFrame(data)
df['dt_date'] = pd.to_datetime(df['Date'], errors = 'coerce') # Note: '2923-01-18' throws error without coerce
df[df['dt_date'].isna()]
| Event | Date | dt_date | |
|---|---|---|---|
| 0 | Meeting A | 2023-01-15 | 2023-01-15 00:00:00 |
| 1 | Meeting B | 2923-01-18 | NaT |
| 2 | Presentation C | 2023-02-01 | 2023-02-01 00:00:00 |
| 3 | Workshop D | 2023-02-05 | 2023-02-05 00:00:00 |
| Event | Date | dt_date | |
|---|---|---|---|
| 1 | Meeting B | 2923-01-18 | NaT |
Based on details, exact value causing error is 2821-11-10. You can get that row with df[df['Date'] == "2821-11-10"] or change that value with a replace function like df['date_clean'] = df['Date'].replace("2821-11-10", "2021-11-10").
I recommend taking the first approach to identify how many errors exist in the data and then handling the fixing/replacing.