Beginning with something like this:
SUBJECT TIME C_INTAKE
C02 0.00 0.0
C02 48.00 0.2
C02 49.31 0.4
C02 51.20 0.4
C02 52.61 0.4
C02 55.61 0.6
C02 82.77 0.9
C02 84.97 1.1
C02 86.39 1.1
.
.
C03 0.00 0.0
I'd like to:
SUBJECT BINTIME C_INTAKE2
C02 0.00 0.0
C02 10.00 0.0
C02 20.00 0.0
C02 30.00 0.0
C02 40.00 0.0
C02 50.00 0.4
C02 60.00 0.6
C02 70.00 0.6
C02 80.00 0.6
C02 90.00 1.1
C02 100.00 1.1
.
.
C03 0.00 0.0
Thank you for your advice
EDIT: Thank you for your input so far. In trying to adapt it to a larger data set, I've come across an error. I am brand new to this (but learning from this forum!) so I'm afraid I'm not able to troubleshoot it on my own.
import pandas as pd
import numpy as np
# Import data from url
url = "https://raw.githubusercontent.com/yadatree/AL/main/intake.csv"
intake=pd.read_csv(url)
# select data
subject = intake['SUBJECT']
time = intake['NTIME']
c_intake = intake['cintake']
df = pd.DataFrame({'subject': subject, 'time': time, 'c_intake': c_intake})
# Create bins
bins = list(range(0,82801, 92))
# calculate cumulative sum by subject and bin
df.groupby([df'subject, pd.cut(df.time, bins)]) \
.agg({'c_intake': 'sum'}) \
.groupby('subject') \
.cumsum() \
.reset_index()
Ouput:
File "<ipython-input-32-ffea10b96606>", line 19
df.groupby([df'subject, pd.cut(df.time, bins)]) .agg({'c_intake': 'sum'}) .groupby('subject') .cumsum() .reset_index()
^
SyntaxError: invalid syntax
Assuming you want to sum c_intake
within a subject and bin group, the following solution should work.
import pandas as pd
import numpy as np
# Create sample data
subjects = [item for subject in ['C02', 'C03', 'C04'] for item in [subject] * 10]
time = np.random.randint(0,100, 30)
c_intake = np.random.rand(30)
df = pd.DataFrame({'subject': subjects, 'time': time, 'c_intake': c_intake})
# Create desired bins
bins = list(range(0,101, 10))
# calculate cumulative sum by subject and bin
df.groupby(['subject', pd.cut(df.time, bins)]) \
.agg({'c_intake': 'sum'}) \
.groupby('subject') \
.cumsum() \
.reset_index()