I have a dataset in spark which has values like
org_id | user_id | call_id | audio_bw | video_bw | share_bw |
---|---|---|---|---|---|
1 | 1 | 1 | 2 | 3 | 4 |
I am able to transpose it into the following using multiple SQL queries, bit wanted to do it using code, if possible using flatMap
org_id | user_id | call_id | type | bw |
---|---|---|---|---|
1 | 1 | 1 | audio | 2 |
1 | 1 | 1 | video | 3 |
1 | 1 | 1 | share | 4 |
Assuming that your dataset is called data
, this:
data = data.select(
col("org_id"),
col("user_id"),
col("call_id"),
expr("stack(3, 'audio', audio_bw, 'video', video_bw, 'share', share_bw) as (type, bw)")
)
does what you want, the final output table:
+------+-------+-------+-----+---+
|org_id|user_id|call_id| type| bw|
+------+-------+-------+-----+---+
| 1| 1| 1|audio| 2|
| 1| 1| 1|video| 3|
| 1| 1| 1|share| 4|
+------+-------+-------+-----+---+
More about stack
can be found here!
EDIT:
In case you have multiple columns to stack:
data
.select(
col("org_id"),
col("user_id"),
col("call_id"),
// concatinate your data
expr("concat_ws(' - ', audio_rx_bw, audio_tx_bw)").as("audio"),
expr("concat_ws(' - ', video_rx_bw, video_tx_bw)").as("video"),
expr("concat_ws(' - ', share_rx_bw, share_tx_bw)").as("share")
)
.select(
col("org_id"),
col("user_id"),
col("call_id"),
// stack again, but this time for the concatenated values
expr("stack(3, 'audio', audio, 'video', video, 'share', share) as (type, data)")
)
// once done with stacking, get the data in your desired format
.withColumn("rx_bw", split(col("data"), " - ")(0))
.withColumn("tx_bw", split(col("data"), " - ")(1))
.drop("data")
Input:
+------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
|org_id|user_id|call_id|audio_rx_bw|audio_tx_bw|video_rx_bw|video_tx_bw|share_rx_bw|share_tx_bw|
+------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
| 1| 1| 1| 2| 3| 4| 2| 4| 5|
+------+-------+-------+-----------+-----------+-----------+-----------+-----------+-----------+
Output:
+------+-------+-------+-----+-----+-----+
|org_id|user_id|call_id| type|rx_bw|tx_bw|
+------+-------+-------+-----+-----+-----+
| 1| 1| 1|audio| 2| 3|
| 1| 1| 1|video| 4| 2|
| 1| 1| 1|share| 4| 5|
+------+-------+-------+-----+-----+-----+
for this input: