juliadataframes.jl

Julia DataFrames convert all columns from Int to String


Any idea why this is not working?

transform(df, All() .=> string; renamecols=false)

Isn't it supposed to apply the string function to all columns and as a result convert them? When adding ByRow it works, but an operation like this should be on entire columns not on each row.


Solution

  • What you describe works as expected. string takes a whole vector and converts it to string (the vector, not its contents). To work on elements of the vector use ByRow, as you have commented, or use broadcasting:

    julia> df = DataFrame(x=1:2, y=3:4, z=5:6)
    2×3 DataFrame
     Row │ x      y      z
         │ Int64  Int64  Int64
    ─────┼─────────────────────
       1 │     1      3      5
       2 │     2      4      6
    
    julia> transform(df, All() .=> string; renamecols=false)
    2×3 DataFrame
     Row │ x       y       z
         │ String  String  String
    ─────┼────────────────────────
       1 │ [1, 2]  [3, 4]  [5, 6]
       2 │ [1, 2]  [3, 4]  [5, 6]
    
    julia> transform(df, All() .=> ByRow(string); renamecols=false)
    2×3 DataFrame
     Row │ x       y       z
         │ String  String  String
    ─────┼────────────────────────
       1 │ 1       3       5
       2 │ 2       4       6
    
    julia> string.(df) # broadcasting version
    2×3 DataFrame
     Row │ x       y       z
         │ String  String  String
    ─────┼────────────────────────
       1 │ 1       3       5
       2 │ 2       4       6
    

    The reason why in All() .=> string you still get a vector is that transform enforces that the number of rows is not changed in the result. Therefore the resulting string is reused. Note that with combine you would get a single row:

    julia> combine(df, All() .=> string; renamecols=false)
    1×3 DataFrame
     Row │ x       y       z
         │ String  String  String
    ─────┼────────────────────────
       1 │ [1, 2]  [3, 4]  [5, 6]
    

    To highlight the issue see how string operates on a vector without DataFrames.jl:

    julia> string([1, 2])
    "[1, 2]"