dataframegroup-byjuliadataframes.jl

Calculate percentage change by year and sector in Julia


I have dataframe as follow:

julia> println(df_roa_kvkt)
15×3 DataFrame
 Row │ year   sector                    roa 
     │ Int64  String                    Float64
─────┼─────────────────────────────────────────
   1 │  2017  Construction              -1.15
   2 │  2017  Services                  -1.78
   3 │  2017  Agriculture               -1.82
   4 │  2018  Construction              -1.05
   5 │  2018  Services                  -1.56
   6 │  2018  Agriculture               -1.55
   7 │  2019  Construction              -1.32
   8 │  2019  Services                  -1.53
   9 │  2019  Agriculture               -1.38
  10 │  2020  Construction              -1.11
  11 │  2020  Services                  -1.93
  12 │  2020  Agriculture               -1.21
  13 │  2021  Construction              -0.03
  14 │  2021  Services                  -0.06
  15 │  2021  Agriculture               -0.36
using DataFrames

df_roa_kvkt = DataFrame(year = [2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
                     sector = ["Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture"],
                     roa = [-1.15, -1.78, -1.82, -1.05, -1.56, -1.55, -1.32, -1.53, -1.38, -1.11, -1.93, -1.21, -0.03, -0.06, -0.36])

I want to calcalate the percentage change by each year for each sector. I have used the following codes:

function pct_change(input::AbstractVector{<:Number})
    [i == 1 ? missing : (input[i]-input[i-1])/input[i-1] for i in eachindex(input)]
end

sort!(df_roa_kvkt, [:year, :sector])
combine(groupby(df_roa_kvkt, [:year, :sector]), :roa => (x -> pct_change(x)) => :proportion)

But the results are all missing in proportion columns. The result I hope to see is that 4 columns with year, sector, roa and proportion columns.


Solution

  • The issue is because you're grouping by sector and year, which means every row is the first one in it's group, which means every row of proportion is missing.

    The solution is to group just by sector. This drops the year column, so if we want to keep it, then (I believe the only way is to) use transform( or transform!) instead:

    transform(groupby(df_roa_kvkt, :sector), :roa => (x -> pct_change(x)) => :proportion)
    

    Output:

    15×4 DataFrame
     Row │ year   sector        roa      proportion      
         │ Int64  String        Float64  Float64?        
    ─────┼───────────────────────────────────────────────
       1 │  2017  Agriculture     -1.82  missing         
       2 │  2017  Construction    -1.15  missing         
       3 │  2017  Services        -1.78  missing         
       4 │  2018  Agriculture     -1.55       -0.148352
       5 │  2018  Construction    -1.05       -0.0869565
      ⋮  │   ⋮         ⋮           ⋮            ⋮
      11 │  2020  Construction    -1.11       -0.159091
      12 │  2020  Services        -1.93        0.261438
      13 │  2021  Agriculture     -0.36       -0.702479
      14 │  2021  Construction    -0.03       -0.972973
      15 │  2021  Services        -0.06       -0.968912
                                           5 rows omitted