In Julia, one can draw a boxplot using StatsPlots.jl
. Assuming There is a DataFrame
named df
, we can draw a boxplot for one of its columns named a
by this:
julia> @df df boxplot(["a"], :a, fillalpha=0.75, linewidth=2)
I want to put the same structure in a function:
julia> function BoxPlotColumn(col::Union{Symbol, String}, df::DataFrame)
if isa(col, String)
@df df boxplot([col], Symbol(col), fillalpha=0.75, linewidth=2)
else
@df df boxplot([String(col)], col, fillalpha=0.75, linewidth=2)
end
end
BoxPlotColumn (generic function with 1 method)
Then, if I say BoxPlotColumn("a", df)
, Julia throws an error:
ERROR: Cannot convert Symbol to series data for plotting
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] _prepare_series_data(x::Symbol)
@ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:8
[3] _series_data_vector(x::Symbol, plotattributes::Dict{Symbol, Any})
@ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:35
[4] macro expansion
@ C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\series.jl:135 [inlined]
[5] apply_recipe(plotattributes::AbstractDict{Symbol, Any}, #unused#::Type{RecipesPipeline.SliceIt}, x::Any, y::Any, z::Any)
@ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesBase\qpxEX\src\RecipesBase.jl:289
[6] _process_userrecipes!(plt::Any, plotattributes::Any, args::Any)
@ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\user_recipe.jl:36
[7] recipe_pipeline!(plt::Any, plotattributes::Any, args::Any)
@ RecipesPipeline C:\Users\Shayan\.julia\packages\RecipesPipeline\OXGmH\src\RecipesPipeline.jl:70
[8] _plot!(plt::Plots.Plot, plotattributes::Any, args::Any)
@ Plots C:\Users\Shayan\.julia\packages\Plots\lW9ll\src\plot.jl:209
[9] #plot#145
@ C:\Users\Shayan\.julia\packages\Plots\lW9ll\src\plot.jl:91 [inlined]
[10] boxplot(::Any, ::Vararg{Any}; kw::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
@ Plots C:\Users\Shayan\.julia\packages\RecipesBase\qpxEX\src\RecipesBase.jl:410
[11] add_label(::Vector{String}, ::typeof(boxplot), ::Vector{String}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:fillalpha, :linewidth), Tuple{Float64, Int64}}}) @ StatsPlots C:\Users\Shayan\.julia\packages\StatsPlots\faFN5\src\df.jl:153
[12] (::var"#33#34"{String})(349::DataFrame)
@ Main .\none:0
[13] BoxPlotColumn(col::String, df::DataFrame)
@ Main c:\Users\Shayan\Documents\Python Scripts\test2.jl:15
[14] top-level scope
@ c:\Users\Shayan\Documents\Python Scripts\test2.jl:22
Which is because of this : @df df boxplot([col], Symbol(col), fillalpha=0.75, linewidth=2)
How can I fix this? Why does this happen? I wrote the same thing just in a function.
I wrote the same thing just in a function.
You have not written the same thing. In your original code you use string and Symbol
literals, and in function you pass a variable. This is the key difference.
To fix this I recommend you to use @with
from DataFramesMeta.jl:
BoxPlotColumn(col::Union{Symbol, String}, df::DataFrame) =
@with df boxplot([string(col)], $col, fillalpha=0.75, linewidth=2)
which does what you want, as @with
supports working with column names programmatically with $
.
EDIT
Why Julia doesn't operate when we say
boxplot(..., col, ...)
It does not operate because both @df
and @which
are macros. Since they are macros they transform code into other code that is only later executed. These macros are designed in a way that when they see a symbol literal, e.g. :a
they treat it in a special way and consider it to be a column of a data frame. When they see a variable col
they cannot know that this variable points to a symbol as the macro is executed before code is evaluated (remember - macro is a method to transform code into other code before this code is executed). See https://docs.julialang.org/en/v1/manual/metaprogramming/#man-macros
MethodError: no method matching isfinite(::String15)
Most likely you have a column with strings not numbers, instead write e.g. names(df, Real)
to only get a list of columns that store real numbers (without missing). If you want to allow missing then write names(df, Union{Missing,Real})
.