pythonpandas-groupby

How to I get rid of the zeros in the final column


I'm working on an assignment of Applied Data Science.

Question: Cut % Renewable into 5 bins. Group Top15 by the Continent, as well as these new % Renewable bins. How many countries are in each of these groups? This function should return a Series with a MultiIndex of Continent, then the bins for % Renewable. Do not include groups with no countries.

This is my code:

def answer_twelve():

    Top15 = answer_one()
    ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}
    Top15['Continent'] = Top15.index.to_series().map(ContinentDict)
    Top15['bins'] = pd.cut(Top15['% Renewable'],5)
    return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size())#,apply(lambda x:s if x['Rank']==0 continue))
answer_twelve()

This is my output for the above code

Continent      bins            
Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1
               (29.227, 42.701]    0
               (42.701, 56.174]    0
               (56.174, 69.648]    0
Australia      (2.212, 15.753]     1
               (15.753, 29.227]    0
               (29.227, 42.701]    0
               (42.701, 56.174]    0
               (56.174, 69.648]    0
Europe         (2.212, 15.753]     1
               (15.753, 29.227]    3
               (29.227, 42.701]    2
               (42.701, 56.174]    0
               (56.174, 69.648]    0
North America  (2.212, 15.753]     1
               (15.753, 29.227]    0
               (29.227, 42.701]    0
               (42.701, 56.174]    0
               (56.174, 69.648]    1
South America  (2.212, 15.753]     0
               (15.753, 29.227]    0
               (29.227, 42.701]    0
               (42.701, 56.174]    0
               (56.174, 69.648]    1
dtype: int64

Required output is

Continent      bins            
Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1
Australia      (2.212, 15.753]     1
Europe         (2.212, 15.753]     1
               (15.753, 29.227]    3
               (29.227, 42.701]    2
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
Name: Countries, dtype: int64

I want to get rid of the zeros, i tried using

pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))

But I keep getting the following error

File "<ipython-input-317-14bc05bb2137>", line 20
    return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))
                                                                                                              ^
SyntaxError: invalid syntax

I'm unable to figure out my mistake. Please help me!


Solution

  • Use pandas and just drop the rows when the column is zero

    if column_name is your column:

    df = df[df.column_name != 0]