I have dataframe as follow:
julia> println(df_roa_kvkt)
15×3 DataFrame
Row │ year sector roa
│ Int64 String Float64
─────┼─────────────────────────────────────────
1 │ 2017 Construction -1.15
2 │ 2017 Services -1.78
3 │ 2017 Agriculture -1.82
4 │ 2018 Construction -1.05
5 │ 2018 Services -1.56
6 │ 2018 Agriculture -1.55
7 │ 2019 Construction -1.32
8 │ 2019 Services -1.53
9 │ 2019 Agriculture -1.38
10 │ 2020 Construction -1.11
11 │ 2020 Services -1.93
12 │ 2020 Agriculture -1.21
13 │ 2021 Construction -0.03
14 │ 2021 Services -0.06
15 │ 2021 Agriculture -0.36
using DataFrames
df_roa_kvkt = DataFrame(year = [2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
sector = ["Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture"],
roa = [-1.15, -1.78, -1.82, -1.05, -1.56, -1.55, -1.32, -1.53, -1.38, -1.11, -1.93, -1.21, -0.03, -0.06, -0.36])
I want to calcalate the percentage change by each year for each sector. I have used the following codes:
function pct_change(input::AbstractVector{<:Number})
[i == 1 ? missing : (input[i]-input[i-1])/input[i-1] for i in eachindex(input)]
end
sort!(df_roa_kvkt, [:year, :sector])
combine(groupby(df_roa_kvkt, [:year, :sector]), :roa => (x -> pct_change(x)) => :proportion)
But the results are all missing in proportion columns. The result I hope to see is that 4 columns with year, sector, roa and proportion columns.
The issue is because you're grouping by sector and year, which means every row is the first one in it's group, which means every row of proportion
is missing
.
The solution is to group just by sector. This drops the year column, so if we want to keep it, then (I believe the only way is to) use transform
( or transform!
) instead:
transform(groupby(df_roa_kvkt, :sector), :roa => (x -> pct_change(x)) => :proportion)
Output:
15×4 DataFrame
Row │ year sector roa proportion
│ Int64 String Float64 Float64?
─────┼───────────────────────────────────────────────
1 │ 2017 Agriculture -1.82 missing
2 │ 2017 Construction -1.15 missing
3 │ 2017 Services -1.78 missing
4 │ 2018 Agriculture -1.55 -0.148352
5 │ 2018 Construction -1.05 -0.0869565
⋮ │ ⋮ ⋮ ⋮ ⋮
11 │ 2020 Construction -1.11 -0.159091
12 │ 2020 Services -1.93 0.261438
13 │ 2021 Agriculture -0.36 -0.702479
14 │ 2021 Construction -0.03 -0.972973
15 │ 2021 Services -0.06 -0.968912
5 rows omitted