dataframejuliadataframes.jl

replace missing values in julia DataFrame only for subset of columns


New to Julia alert! I have the following Julia DataFrame that includes missing values:

dat = DataFrame(a = [1,missing,3], b = [missing,5,6], c = [7,8,missing])

I want to replace missing values only for the following subset of the columns

relevant_cols = [:a, :b]

and leave the other column (c) the way it is.

The following options replace the missing values in the relevant columns, but they both drop the other column c which I want to keep just the way it is.

 coalesce.(dat[!,relevant_cols], 0)
 mapcols(col -> replace(col, missing => 0), dat[!,relevant_cols])

How can I replace missing values in a subset of columns but still keep the others?


Solution

  • Using ifelse might be the most straightforward way

    cols = [:a, :b]
    
    dat[:, cols] = ifelse.(ismissing.(dat[:, cols]), 0, dat[:, cols]);
    
    dat
    3×3 DataFrame
     Row │ a       b       c
         │ Int64?  Int64?  Int64?
    ─────┼─────────────────────────
       1 │      1       0        7
       2 │      0       5        8
       3 │      3       6  missing
    

    or using a loop and replace

    cols = [:a, :b]
    
    for i in cols
      dat[:, i] = replace(dat[:, i], missing => 0)
    end
    
    dat
    3×3 DataFrame
     Row │ a       b       c
         │ Int64?  Int64?  Int64?
    ─────┼─────────────────────────
       1 │      1       0        7
       2 │      0       5        8
       3 │      3       6  missing