Text can be used as input to pandas dataframes to make easily reproducible examples for testing solutions.1
import pandas as pd
from io import StringIO
txt= """
ID,datetime,value
AB-CL-34,07/10/2022 10:00:00,5
AB-CL-34,07/10/2022 11:15:10,7
AB-CL-34,09/10/2022 15:30:30,13
BX-RT-55,06/10/2022 11:30:22,0
BX-RT-55,10/10/2022 22:44:11,1
BX-RT-55,10/10/2022 23:30:22,6
"""
df = pd.read_csv(StringIO(txt), parse_dates=[1], dayfirst=True)
How do I index df
such that there is an integer index according to each ID, keeping the ID variable?
imagined output
ID datetime value
0 AB-CL-34 07/10/2022 10:00:00 5
0 AB-CL-34 07/10/2022 11:15:10 7
0 AB-CL-34 09/10/2022 15:30:30 13
1 BX-RT-55 06/10/2022 11:30:22 0
1 BX-RT-55 10/10/2022 22:44:11 1
1 BX-RT-55 10/10/2022 23:30:22 6
edit ID values are now chr/str, not int, as they should have been (apologies to Marcus_CH.
You could try:
df = pd.read_csv(StringIO(txt),\
parse_dates=[1],\
dayfirst=True)\
.assign(id_index= lambda df: df\
.groupby('ID', sort=False).ngroup())\
.set_index("id_index")\
.rename_axis(index=None)
Output
+----+----------+---------------------+---------+
| | ID | datetime | value |
|----+----------+---------------------+---------|
| 0 | AB-CL-34 | 2022-10-07 10:00:00 | 5 |
| 0 | AB-CL-34 | 2022-10-07 11:15:10 | 7 |
| 0 | AB-CL-34 | 2022-10-09 15:30:30 | 13 |
| 1 | BX-RT-55 | 2022-10-06 11:30:22 | 0 |
| 1 | BX-RT-55 | 2022-10-10 22:44:11 | 1 |
| 1 | BX-RT-55 | 2022-10-10 23:30:22 | 6 |
+----+----------+---------------------+---------+
References:
index=None
to remove the index name (try it without this statement)df["id_col"] = df["ID"] - 1
. lambda
makes it easier and faster.\
for linebreak (so it's just for styling)