juliadataframes.jl

Asigning a missing value to particular value in a DataFrames


Hi I have a dataframe where I want to replace a particular value such as 99 with a missing value. Its difficult to do this as you cannot change the column value types. One way to do this would be

df = DataFrame(
    :x1 => [1,2,99],
    :x2 => [10,99,11],
    :x3 => [20,21,22]
)

ammended_cols = replace.(eachcol(df), 99 => missing) 
df_n = map((x,y) -> x => y, names(df),ammended_cols) |> DataFrame

julia> @show df_n
df_n = 3×3 DataFrame
 Row │ x1       x2       x3
     │ Int64?   Int64?   Int64?
─────┼──────────────────────────
   1 │       1       10      20
   2 │       2  missing      21
   3 │ missing       11      22

However I am wondering if there is easier/better way to do this such as using replace! to mutate the existing dataframe. However doing this would result in a type conversion error

replace!.(eachcol(df), 1.23 => missing)

ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Int64

Solution

  • U can use the allowmissing function (official docs) which changes the type of each column in the DataFrame to allow missing values i.e. the type of each column will change from T to Union{T, Missing}, where T is the original type of the column, and Union{T, Missing} means that it could have either the T type or the special missing type.

    df = DataFrame(
        :x1 => [1,2,99],
        :x2 => [10,99,11],
        :x3 => [20,21,22]
    )
    
    # Converting df to allow missing data
    allowmissing!(df)
    
    replace!.(eachcol(df), 99 => missing)
    

    The exclamation mark ! in allowmissing!(df) means that it directly changes the df, instead of creating a new DataFrame. If you want to create a new DataFrame, the code could be:

    df = DataFrame(
        :x1 => [1,2,99],
        :x2 => [10,99,11],
        :x3 => [20,21,22]
    )
    
    df_n = allowmissing(df)
    
    df_n = DataFrame(replace.(eachcol(df_n), 99 => missing), names(df_n))
    

    enter image description here