juliadataframes.jl

Julia DataFrames concatenate multiple columns by a space


In DataFrames, I have 4 columns of type String. I want to concatenate all of their values with a space.

Currently, I'm doing this:

transform(df, All() => ((a,b,c,d) -> a .* " " .* b .* " " .* c .* " " .* d) => :combined_col)

Is there a more concise way of doing this without using .* multiple times? Maybe using the join function?

p.s., I'm using this inside a @chain so I want the same style of syntax not using indexing.

UPDATE: this works but I have no idea why can someone explain?

transform(df, All() => ByRow((all...) -> join(all, " ")) => :combined)

Solution

  • Let me explain transform(df, All() => ByRow((all...) -> join(all, " ")) => :combined):

    1. You need ByRow to apply the function row-wise to your data frame.
    2. The join function accepts an iterator as its first argument, so all must be an iterator (in your example, it is a tuple).
    3. The All() source passes the selected columns as consecutive positional arguments to the function. Therefore you need all... to turn consecutive positional arguments into a tuple.

    Instead of all... you could write:

    transform(df, AsTable(All()) => ByRow(x -> join(x, " ")) => :combined)
    

    The difference is that AsTable(All()) passes the selected columns as a single positional argument to the function (in a form of named tuple). Therefore you already have an iterable to pass to join (since named tuple is iterable).

    Going back to your original question how to use .* to get the result the answer is:

    transform(df, All() => ((x...) -> foldl((p, q) -> p .* " " .* q, x)) => :combined)
    

    Note that you do not need ByRow in this case as .* already does broadcasting. You would need it if you used * instead of .*:

    transform(df, All() => ByRow((x...) -> foldl((p, q) -> p * " " * q, x)) => :combined)