The data I am working with:
data (140631115432592), ndim: 2, size: 3947910, shape: (232230, 17)
VIN (1-10) object
County object
City object
State object
Postal Code float64
Model Year int64
Make object
Model object
Electric Vehicle Type object
Clean Alternative Fuel Vehicle (CAFV) Eligibility object
Electric Range float64
Base MSRP float64
Legislative District float64
DOL Vehicle ID int64
Vehicle Location object
Electric Utility object
2020 Census Tract float64
dtype: object
VIN (1-10) County City State Postal Code ... Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
0 2T3YL4DV0E King Bellevue WA 98005.0 ... 41.0 186450183 POINT (-122.1621 47.64441) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303302e+10
1 5YJ3E1EB6K King Bothell WA 98011.0 ... 1.0 478093654 POINT (-122.20563 47.76144) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303302e+10
2 5UX43EU02S Thurston Olympia WA 98502.0 ... 35.0 274800718 POINT (-122.92333 47.03779) PUGET SOUND ENERGY INC 5.306701e+10
3 JTMAB3FV5R Thurston Olympia WA 98513.0 ... 2.0 260758165 POINT (-122.81754 46.98876) PUGET SOUND ENERGY INC 5.306701e+10
4 5YJYGDEE8M Yakima Selah WA 98942.0 ... 15.0 236581355 POINT (-120.53145 46.65405) PACIFICORP 5.307700e+10
Data in csv format:
VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
2T3YL4DV0E,King,Bellevue,WA,98005,2014,TOYOTA,RAV4,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,103,0,41,186450183,POINT (-122.1621 47.64441),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033023604
5YJ3E1EB6K,King,Bothell,WA,98011,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,1,478093654,POINT (-122.20563 47.76144),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033022102
5UX43EU02S,Thurston,Olympia,WA,98502,2025,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,40,0,35,274800718,POINT (-122.92333 47.03779),PUGET SOUND ENERGY INC,53067011902
JTMAB3FV5R,Thurston,Olympia,WA,98513,2024,TOYOTA,RAV4 PRIME,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,42,0,2,260758165,POINT (-122.81754 46.98876),PUGET SOUND ENERGY INC,53067012332
5YJYGDEE8M,Yakima,Selah,WA,98942,2021,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not been researched,0,0,15,236581355,POINT (-120.53145 46.65405),PACIFICORP,53077003200
3C3CFFGE1G,Thurston,Olympia,WA,98501,2016,FIAT,500,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,22,294762219,POINT (-122.89166 47.03956),PUGET SOUND ENERGY INC,53067010802
5YJ3E1EA4J,Snohomish,Marysville,WA,98271,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,215,0,39,270125096,POINT (-122.1677 48.11026),PUGET SOUND ENERGY INC,53061052808
5YJ3E1EA3K,King,Seattle,WA,98102,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,43,238776492,POINT (-122.32427 47.63433),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033006600
1N4AZ0CP5E,Thurston,Yelm,WA,98597,2014,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,2,257246118,POINT (-122.60735 46.94239),PUGET SOUND ENERGY INC,53067012421
Filtering and grouping:
filt = (data["Model Year"] >= 2018) & (data["Electric Vehicle Type"] == "Battery Electric Vehicle (BEV)")
data = data[filt].groupby(["State", "Make"], sort=False, observed=True, as_index=False).agg( avg_electric_range=pd.NamedAgg(column="Electric Range", aggfunc="mean"), oldest_model_year=pd.NamedAgg(column="Model Year", aggfunc="min"))
Currently it yields the following table:
State Make avg_electric_range oldest_model_year
0 WA TESLA 52.143448 2018
1 WA NISSAN 60.051874 2018
<snip>
How do I add a Count
column which shows the count of each group which is used for further filtering? Note: rule out apply
as everything should stay in Pandas'land.
Your question would benefit from a minimal reproducible example.
That said, the count doesn't really depend on a particular column, as long as you don't have missing values, thus pick any one that matches this criterion and add another aggregation (you can use one of the grouping columns or Model Year
since you know it must be a valid number):
out = (data[filt].groupby(["State", "Make"], sort=False, observed=True, as_index=False)
.agg(avg_electric_range=pd.NamedAgg(column="Electric Range", aggfunc="mean"),
oldest_model_year=pd.NamedAgg(column="Model Year", aggfunc="min"),
count=pd.NamedAgg(column="Model Year", aggfunc="count"),
)
)
Example output:
State Make avg_electric_range oldest_model_year count
0 WA X 0.5 2018 2
1 WA Y 3.0 2018 3