dataframejulia

# Efficient way to calculate change for multiple columns in Julia

Say I have a DataFrame in Julia:

``````using DataFrames, Dates

# Sample data
df = DataFrame(
id = [1, 1, 2, 2, 3, 3],
filing_date = Date.(["2022-01-01", "2022-02-01", "2022-01-10", "2022-02-20", "2022-01-15", "2022-03-01"]),
col1 = [100, 110, 200, 220, 50, 55],
col2 = [1000, 1050, 2000, 2100, 500, 525]
)
``````

I would like to compute the percent change over a period in each column for each id which has the relevant data. I have the following method, but in reality I have 50 columns instead of 2, so need a way to do it for all columns in an efficient way.

``````# Function to compute percent change for a given column
function percent_change_column(gdf::SubDataFrame, column::Symbol)
sort!(gdf, :filing_date, rev=true)
if nrow(gdf) == 1
return missing
end
return (gdf[1, column] - gdf[2, column]) / gdf[2, column] * 100
end

# Calculate percent change for each column
col1_change = combine(groupby(df, :id), gdf -> DataFrame(col1_change = percent_change_column(gdf, :col1)))
col2_change = combine(groupby(df, :id), gdf -> DataFrame(col2_change = percent_change_column(gdf, :col2)))

# Merge the results
result = leftjoin(col1_change, col2_change, on=:id)
``````

The end result would be:

``````Row id  col1_change col2_change
Int64   Float64 Float64?
1   1   10.0    5.0
2   2   10.0    5.0
3   3   10.0    5.0
``````

But as I mentioned, I would need to do this for 50 columns.

How can I solve this problem? Is there a way to apply these operations over all columns?

Solution

• It seems like you're wanting something like this - using the `cols => function => target_cols` syntax described in the Dataframes.jl documentation:

``````combine(groupby(df, :id), names(df, Not([:id, :filing_date])) .=> (x -> diff(x).*100 ./ x[1:end-1]) .=> names(df, Not([:id, :filing_date])) .* "_change" )
``````

Output:

``````3×3 DataFrame
Row │ id     col1_change  col2_change
│ Int64  Float64      Float64
─────┼─────────────────────────────────
1 │     1         10.0          5.0
2 │     2         10.0          5.0
3 │     3         10.0          5.0
``````