dataframejulia

Efficient way to calculate change for multiple columns in Julia


Say I have a DataFrame in Julia:

using DataFrames, Dates

# Sample data
df = DataFrame(
    id = [1, 1, 2, 2, 3, 3],
    filing_date = Date.(["2022-01-01", "2022-02-01", "2022-01-10", "2022-02-20", "2022-01-15", "2022-03-01"]),
    col1 = [100, 110, 200, 220, 50, 55],
    col2 = [1000, 1050, 2000, 2100, 500, 525]
)

I would like to compute the percent change over a period in each column for each id which has the relevant data. I have the following method, but in reality I have 50 columns instead of 2, so need a way to do it for all columns in an efficient way.

# Function to compute percent change for a given column
function percent_change_column(gdf::SubDataFrame, column::Symbol)
    sort!(gdf, :filing_date, rev=true)
    if nrow(gdf) == 1
        return missing
    end
    return (gdf[1, column] - gdf[2, column]) / gdf[2, column] * 100
end

# Calculate percent change for each column
col1_change = combine(groupby(df, :id), gdf -> DataFrame(col1_change = percent_change_column(gdf, :col1)))
col2_change = combine(groupby(df, :id), gdf -> DataFrame(col2_change = percent_change_column(gdf, :col2)))

 

# Merge the results
result = leftjoin(col1_change, col2_change, on=:id)

The end result would be:

Row id  col1_change col2_change
Int64   Float64 Float64?
1   1   10.0    5.0
2   2   10.0    5.0
3   3   10.0    5.0

But as I mentioned, I would need to do this for 50 columns.

How can I solve this problem? Is there a way to apply these operations over all columns?


Solution

  • It seems like you're wanting something like this - using the cols => function => target_cols syntax described in the Dataframes.jl documentation:

    combine(groupby(df, :id), names(df, Not([:id, :filing_date])) .=> (x -> diff(x).*100 ./ x[1:end-1]) .=> names(df, Not([:id, :filing_date])) .* "_change" )
    

    Output:

    3×3 DataFrame
     Row │ id     col1_change  col2_change 
         │ Int64  Float64      Float64     
    ─────┼─────────────────────────────────
       1 │     1         10.0          5.0
       2 │     2         10.0          5.0
       3 │     3         10.0          5.0