juliadataframes.jlstatsplots.jl

A function for boxplot a column of dataframe in Julia


In Julia, one can draw a boxplot using StatsPlots.jl. Assuming There is a DataFrame named df, we can draw a boxplot for one of its columns named a by this:

julia> @df df boxplot(["a"], :a, fillalpha=0.75, linewidth=2)

I want to put the same structure in a function:

julia> function BoxPlotColumn(col::Union{Symbol, String}, df::DataFrame)
           if isa(col, String)
               @df df boxplot([col], Symbol(col), fillalpha=0.75, linewidth=2)
           else
               @df df boxplot([String(col)], col, fillalpha=0.75, linewidth=2)
           end
       end
BoxPlotColumn (generic function with 1 method)

Then, if I say BoxPlotColumn("a", df), Julia throws an error:

ERROR: Cannot convert Symbol to series data for plotting
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] _prepare_series_data(x::Symbol)
    @ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:8
  [3] _series_data_vector(x::Symbol, plotattributes::Dict{Symbol, Any})
    @ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:35
  [4] macro expansion
    @ C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:135 [inlined]
  [5] apply_recipe(plotattributes::AbstractDict{Symbol, Any}, #unused#::Type{RecipesPipeline.SliceIt}, x::Any, y::Any, z::Any)
    @ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesBase\qpxEX\src\RecipesBase.jl:289
  [6] _process_userrecipes!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\user_recipe.jl:36
  [7] recipe_pipeline!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\RecipesPipeline.jl:70
  [8] _plot!(plt::Plots.Plot, plotattributes::Any, args::Any)
    @ Plots C:\Users\Shayan\.julia\packages\Plots\lW9ll\src\plot.jl:209
  [9] #plot#145
    @ C:\Users\Shayan\.julia\packages\Plots\lW9ll\src\plot.jl:91 [inlined]
 [10] boxplot(::Any, ::Vararg{Any}; kw::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
    @ Plots C:\Users\Shayan\.julia\packages\RecipesBase\qpxEX\src\RecipesBase.jl:410
 [11] add_label(::Vector{String}, ::typeof(boxplot), ::Vector{String}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:fillalpha, :linewidth), Tuple{Float64, Int64}}})    @ StatsPlots C:\Users\Shayan\.julia\packages\StatsPlots\faFN5\src\df.jl:153
 [12] (::var"#33#34"{String})(349::DataFrame)
    @ Main .\none:0
 [13] BoxPlotColumn(col::String, df::DataFrame)
    @ Main c:\Users\Shayan\Documents\Python Scripts\test2.jl:15
 [14] top-level scope
    @ c:\Users\Shayan\Documents\Python Scripts\test2.jl:22

Which is because of this : @df df boxplot([col], Symbol(col), fillalpha=0.75, linewidth=2) How can I fix this? Why does this happen? I wrote the same thing just in a function.


Solution

  • I wrote the same thing just in a function.

    You have not written the same thing. In your original code you use string and Symbol literals, and in function you pass a variable. This is the key difference.

    To fix this I recommend you to use @with from DataFramesMeta.jl:

    BoxPlotColumn(col::Union{Symbol, String}, df::DataFrame) =
        @with df boxplot([string(col)], $col, fillalpha=0.75, linewidth=2)
    

    which does what you want, as @with supports working with column names programmatically with $.

    EDIT

    Why Julia doesn't operate when we say boxplot(..., col, ...)

    It does not operate because both @df and @which are macros. Since they are macros they transform code into other code that is only later executed. These macros are designed in a way that when they see a symbol literal, e.g. :a they treat it in a special way and consider it to be a column of a data frame. When they see a variable col they cannot know that this variable points to a symbol as the macro is executed before code is evaluated (remember - macro is a method to transform code into other code before this code is executed). See https://docs.julialang.org/en/v1/manual/metaprogramming/#man-macros

    MethodError: no method matching isfinite(::String15)

    Most likely you have a column with strings not numbers, instead write e.g. names(df, Real) to only get a list of columns that store real numbers (without missing). If you want to allow missing then write names(df, Union{Missing,Real}).