In Oracle I have the following hierarchical sql:
with prod as (select 10 id,'1008' code from dual union all
select 11 id,'1582' code from dual union all
select 12 id,'1583' code from dual union all
select 13 id,'2023' code from dual union all
select 14 id,'2025' code from dual union all
select 15 id,'2030' code from dual union all
select 16 id,'2222' code from dual
),
prre as (select 10 detail_product_id,90 master_product_id from dual union all
select 12 detail_product_id,11 master_product_id from dual union all
select 91 detail_product_id,92 master_product_id from dual union all
select 14 detail_product_id,12 master_product_id from dual union all
select 90 detail_product_id,93 master_product_id from dual union all
select 11 detail_product_id,91 master_product_id from dual union all
select 15 detail_product_id,12 master_product_id from dual union all
select 13 detail_product_id,12 master_product_id from dual union all
select 94 detail_product_id,95 master_product_id from dual
)
select
prod.code,
connect_by_root prod.code group_type
from prre,prod
where prre.detail_product_id = prod.id
connect by nocycle prior prre.detail_product_id = prre.master_product_id
start with prod.code in ('1008', '1582')
Resulting in:
CODE GROUP_TYPE
1582 1582
1583 1582
2030 1582
2025 1582
2023 1582
1008 1008
How can I rewrite it to polars? The starting dfs will be like:
import polars as pl
prod = pl.DataFrame({'id': [10,11,12,13,14,15,16],
'code': ['1008','1582','1583','2023','2025','2030','2222']
})
prre = pl.DataFrame({'detail_product_id': [10,12,91,14,90,11,15,13,94],
'master_product_id': [90,11,92,12,93,91,12,12,95]
})
But how to go on?
The task in plain English is:
The found structure looks like this:
11 -> 1582
|-12 -> 1583
|-13 -> 2023
|-14 -> 2025
|-15 -> 2030
Essentially, what you want is recursive query. If you want pure polars, then I'm afraid you need to join iteratively.
The idea is relatively simple. First, you create starting dataframe with filter()
. Then, during each iteration you use join()
to create the dataframe which contains next level in the hierarchy. While iterating you save these dataframes into list. Iterations stop when you get to the end of the hierarchy. At the end you use .concat()
to combine all the dataframes together.
df = (
prod
.filter(pl.col.code.is_in(['1008', '1582']))
.with_columns(group_type = pl.col.code)
)
dfs = [df]
while True:
df = df.join(prre, left_on="id", right_on="master_product_id")
if df.height == 0:
break
df = (
prod
.join(df, left_on="id", right_on="detail_product_id")
.select("id", "code", "group_type")
)
dfs.append(df)
pl.concat(dfs).drop("id")
┌──────┬────────────┐
│ code ┆ group_type │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪════════════╡
│ 1008 ┆ 1008 │
│ 1582 ┆ 1582 │
│ 1583 ┆ 1582 │
│ 2023 ┆ 1582 │
│ 2025 ┆ 1582 │
│ 2030 ┆ 1582 │
└──────┴────────────┘
But you can also use DuckDB integration with polars and recursive cte
import duckdb
duckdb.sql("""
with recursive cte as (
select
d.code,
d.code as group_type,
d.id,
1 as depth
from prod as d
where
d.code in ('1008', '1582')
union all
select
d.code,
c.group_type,
d.id,
c.depth + 1
from cte as c
inner join prre as p on
p.master_product_id = c.id
inner join prod as d on
d.id = p.detail_product_id
)
select
code,
group_type,
depth
from cte as c
order by
depth asc
""")
┌─────────┬────────────┬───────┐
│ code │ group_type │ depth │
│ varchar │ varchar │ int32 │
├─────────┼────────────┼───────┤
│ 1008 │ 1008 │ 1 │
│ 1582 │ 1582 │ 1 │
│ 1583 │ 1582 │ 2 │
│ 2023 │ 1582 │ 3 │
│ 2025 │ 1582 │ 3 │
│ 2030 │ 1582 │ 3 │
└─────────┴────────────┴───────┘