Say I have a simple dataset (namely df1) like this:
ID Name Max_FileID
--------------------------------
1 Dog 3
1 Dog 3
1 Dog 3
2 Bird 1
3 Cat 5
3 Cat 5
I would like to group the dataset by ID (I have no problem with this):
df1.groupby('ID')
Then add a new column using Max_FileID + Rank.
The result should look like this
ID Name Max_FileID Rank
------------------------------------------
1 Dog 3 4
1 Dog 3 5
1 Dog 3 6
2 Bird 1 2
3 Cat 5 6
3 Cat 5 7
To help understand better what I am trying to achieve, here is a SQL equivalent:
SELECT
ID,
Name,
Max_FileID,
Max_FileID + ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID)
FROM df1
In addition, I created a custom function:
def sequential_ranking(self, iterable, max_id):
try:
max_id = max_id - 1
seq_nums = list()
for num in iterable:
max_id += 1
seq_nums.append(max_id)
return seq_nums
Not sure if there is a built-in function for it. Thanks in advance!!
You can use DataFrameGroupBy.rank
function which returns the rank values in the group and if you specify the method parameter to first
then ranks are assigned in order they appear in the group.
You can use this:
df["Rank"] = df["Max_FileID"] + df.groupby("ID")["Max_FileID"].rank(method="first").astype(int)
Result:
>>> print(df)
ID Name Max_FileID Rank
0 1 Dog 3 4
1 1 Dog 3 5
2 1 Dog 3 6
3 2 Bird 1 2
4 3 Cat 5 6
5 3 Cat 5 7