I'm trying to replicate the repeat_by
and explode
functions from Polars using dbt with a Redshift database but am having difficulty finding an equivalent solution.
Here's the sample Polars DataFrame code for context:
NOTE: I don't need replicate
column, if that's an issue, then don't worry about it
import polars as pl
# Sample DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["New York", "Los Angeles", "Chicago"],
"replicate": [2, 3, 1]
})
Polars DataFrame output:
shape: (3, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
Using the repeat_by
and explode
functions:
df.select(pl.all().repeat_by('replicate').explode())
Result:
shape: (6, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
I need help finding a way to achieve the same result in dbt with Redshift.
Thanks!
I used JOIN and a CROSS JOIN since you mentioned generate_series is not supported.
I tested this in Snowflake , since I dont have redshift. I guess the syntax might not be the exact same, but still should work.
let me know if it works for you.
SET tbl_count = (SELECT max(replicate) FROM my_table);
WITH numbers AS (
SELECT
row_number() OVER ( order by 1) AS num
FROM
(SELECT 1 FROM my_table LIMIT $tbl_count ) )
,
repeated_data AS (
SELECT
t.name,
t.age,
t.city,
t.replicate
FROM
my_table t
CROSS JOIN numbers
WHERE numbers.num <= t.replicate
)
SELECT
name,
age,
city,
replicate
FROM
repeated_data
ORDER BY name, age;
Trying with Recursion since using Limit variable is tricky in Redshift
WITH RECURSIVE numbers AS (
-- Base case: start from 1
SELECT 1 AS num
UNION ALL
-- Recursive case: increment the number by 1 each time
SELECT num + 1
FROM numbers
WHERE num < (SELECT max(replicate) FROM my_table) -- Stop when num reaches max(replicate)
),
repeated_data AS (
SELECT
t.name,
t.age,
t.city,
t.replicate
FROM
my_table t
CROSS JOIN numbers
WHERE numbers.num <= t.replicate
)
SELECT
name,
age,
city,
replicate
FROM
repeated_data
ORDER BY
name, age;
Output :