javaapache-sparkflatmap

Transpose multiple columns to rows in Java Dataset


I have a dataset in spark which has values like

org_id user_id call_id audio_bw video_bw share_bw
1 1 1 2 3 4

I am able to transpose it into the following using multiple SQL queries, bit wanted to do it using code, if possible using flatMap

org_id user_id call_id type bw
1 1 1 audio 2
1 1 1 video 3
1 1 1 share 4

Solution

  • Assuming that your dataset is called data, this:

    data = data.select(
      col("org_id"), 
      col("user_id"), 
      col("call_id"),
      expr("stack(3, 'audio', audio_bw, 'video', video_bw, 'share', share_bw) as (type, bw)")
    )
    

    does what you want, the final output table:

    +------+-------+-------+-----+---+
    |org_id|user_id|call_id| type| bw|
    +------+-------+-------+-----+---+
    |     1|      1|      1|audio|  2|
    |     1|      1|      1|video|  3|
    |     1|      1|      1|share|  4|
    +------+-------+-------+-----+---+
    

    More about stack can be found here!

    EDIT:

    In case you have multiple columns to stack:

    data
      .select(
        col("org_id"),
        col("user_id"),
        col("call_id"),
        // concatinate your data
        expr("concat_ws(' - ', audio_rx_bw, audio_tx_bw)").as("audio"),
        expr("concat_ws(' - ', video_rx_bw, video_tx_bw)").as("video"),
        expr("concat_ws(' - ', share_rx_bw, share_tx_bw)").as("share")
      )
      .select(
        col("org_id"),
        col("user_id"),
        col("call_id"),
        // stack again, but this time for the concatenated values
        expr("stack(3, 'audio', audio, 'video', video, 'share', share) as (type, data)")
      )
      // once done with stacking, get the data in your desired format
      .withColumn("rx_bw", split(col("data"), " - ")(0))
      .withColumn("tx_bw", split(col("data"), " - ")(1))
      .drop("data")
    

    Input:

    +------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
    |org_id|user_id|call_id|audio_rx_bw|audio_tx_bw|video_rx_bw|video_tx_bw|share_rx_bw|share_tx_bw|
    +------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
    |     1|      1|      1|          2|          3|          4|          2|          4|          5|
    +------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
    

    Output:

    +------+-------+-------+-----+-----+-----+
    |org_id|user_id|call_id| type|rx_bw|tx_bw|
    +------+-------+-------+-----+-----+-----+
    |     1|      1|      1|audio|    2|    3|
    |     1|      1|      1|video|    4|    2|
    |     1|      1|      1|share|    4|    5|
    +------+-------+-------+-----+-----+-----+
    

    for this input: