Say I have a DataFrame in Julia:
using DataFrames, Dates
# Sample data
df = DataFrame(
id = [1, 1, 2, 2, 3, 3],
filing_date = Date.(["2022-01-01", "2022-02-01", "2022-01-10", "2022-02-20", "2022-01-15", "2022-03-01"]),
col1 = [100, 110, 200, 220, 50, 55],
col2 = [1000, 1050, 2000, 2100, 500, 525]
)
I would like to compute the percent change over a period in each column for each id which has the relevant data. I have the following method, but in reality I have 50 columns instead of 2, so need a way to do it for all columns in an efficient way.
# Function to compute percent change for a given column
function percent_change_column(gdf::SubDataFrame, column::Symbol)
sort!(gdf, :filing_date, rev=true)
if nrow(gdf) == 1
return missing
end
return (gdf[1, column] - gdf[2, column]) / gdf[2, column] * 100
end
# Calculate percent change for each column
col1_change = combine(groupby(df, :id), gdf -> DataFrame(col1_change = percent_change_column(gdf, :col1)))
col2_change = combine(groupby(df, :id), gdf -> DataFrame(col2_change = percent_change_column(gdf, :col2)))
# Merge the results
result = leftjoin(col1_change, col2_change, on=:id)
The end result would be:
Row id col1_change col2_change
Int64 Float64 Float64?
1 1 10.0 5.0
2 2 10.0 5.0
3 3 10.0 5.0
But as I mentioned, I would need to do this for 50 columns.
How can I solve this problem? Is there a way to apply these operations over all columns?
It seems like you're wanting something like this - using the cols => function => target_cols
syntax described in the Dataframes.jl documentation:
combine(groupby(df, :id), names(df, Not([:id, :filing_date])) .=> (x -> diff(x).*100 ./ x[1:end-1]) .=> names(df, Not([:id, :filing_date])) .* "_change" )
Output:
3×3 DataFrame
Row │ id col1_change col2_change
│ Int64 Float64 Float64
─────┼─────────────────────────────────
1 │ 1 10.0 5.0
2 │ 2 10.0 5.0
3 │ 3 10.0 5.0