# Calculate percentage change by year and sector in Julia

I have dataframe as follow:

``````julia> println(df_roa_kvkt)
15×3 DataFrame
Row │ year   sector                    roa
│ Int64  String                    Float64
─────┼─────────────────────────────────────────
1 │  2017  Construction              -1.15
2 │  2017  Services                  -1.78
3 │  2017  Agriculture               -1.82
4 │  2018  Construction              -1.05
5 │  2018  Services                  -1.56
6 │  2018  Agriculture               -1.55
7 │  2019  Construction              -1.32
8 │  2019  Services                  -1.53
9 │  2019  Agriculture               -1.38
10 │  2020  Construction              -1.11
11 │  2020  Services                  -1.93
12 │  2020  Agriculture               -1.21
13 │  2021  Construction              -0.03
14 │  2021  Services                  -0.06
15 │  2021  Agriculture               -0.36
``````
``````using DataFrames

df_roa_kvkt = DataFrame(year = [2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
sector = ["Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture"],
roa = [-1.15, -1.78, -1.82, -1.05, -1.56, -1.55, -1.32, -1.53, -1.38, -1.11, -1.93, -1.21, -0.03, -0.06, -0.36])
``````

I want to calcalate the percentage change by each year for each sector. I have used the following codes:

``````function pct_change(input::AbstractVector{<:Number})
[i == 1 ? missing : (input[i]-input[i-1])/input[i-1] for i in eachindex(input)]
end

sort!(df_roa_kvkt, [:year, :sector])
combine(groupby(df_roa_kvkt, [:year, :sector]), :roa => (x -> pct_change(x)) => :proportion)
``````

But the results are all missing in proportion columns. The result I hope to see is that 4 columns with year, sector, roa and proportion columns.

Solution

• The issue is because you're grouping by sector and year, which means every row is the first one in it's group, which means every row of `proportion` is `missing`.

The solution is to group just by sector. This drops the year column, so if we want to keep it, then (I believe the only way is to) use `transform`( or `transform!`) instead:

``````transform(groupby(df_roa_kvkt, :sector), :roa => (x -> pct_change(x)) => :proportion)
``````

Output:

``````15×4 DataFrame
Row │ year   sector        roa      proportion
│ Int64  String        Float64  Float64?
─────┼───────────────────────────────────────────────
1 │  2017  Agriculture     -1.82  missing
2 │  2017  Construction    -1.15  missing
3 │  2017  Services        -1.78  missing
4 │  2018  Agriculture     -1.55       -0.148352
5 │  2018  Construction    -1.05       -0.0869565
⋮  │   ⋮         ⋮           ⋮            ⋮
11 │  2020  Construction    -1.11       -0.159091
12 │  2020  Services        -1.93        0.261438
13 │  2021  Agriculture     -0.36       -0.702479
14 │  2021  Construction    -0.03       -0.972973
15 │  2021  Services        -0.06       -0.968912
5 rows omitted
``````