pythonsqlpython-polarsrecursive-query

How to rewrite Oracle sql's hierarchical query to polars?


In Oracle I have the following hierarchical sql:

with prod as (select 10 id,'1008' code from dual union all
              select 11 id,'1582' code from dual union all
              select 12 id,'1583' code from dual union all
              select 13 id,'2023' code from dual union all
              select 14 id,'2025' code from dual union all
              select 15 id,'2030' code from dual union all
              select 16 id,'2222' code from dual
              ),
     prre as (select 10 detail_product_id,90 master_product_id from dual union all
              select 12 detail_product_id,11 master_product_id from dual union all
              select 91 detail_product_id,92 master_product_id from dual union all
              select 14 detail_product_id,12 master_product_id from dual union all
              select 90 detail_product_id,93 master_product_id from dual union all
              select 11 detail_product_id,91 master_product_id from dual union all
              select 15 detail_product_id,12 master_product_id from dual union all
              select 13 detail_product_id,12 master_product_id from dual union all
              select 94 detail_product_id,95 master_product_id from dual
              )
select
   prod.code,
   connect_by_root prod.code group_type
from prre,prod
where prre.detail_product_id = prod.id
connect by nocycle prior prre.detail_product_id = prre.master_product_id
start with prod.code in ('1008', '1582')

Resulting in:

CODE    GROUP_TYPE
1582    1582
1583    1582
2030    1582
2025    1582
2023    1582
1008    1008

How can I rewrite it to polars? The starting dfs will be like:

import polars as pl
prod = pl.DataFrame({'id': [10,11,12,13,14,15,16],
                     'code': ['1008','1582','1583','2023','2025','2030','2222']
                     })
prre = pl.DataFrame({'detail_product_id': [10,12,91,14,90,11,15,13,94],
                     'master_product_id': [90,11,92,12,93,91,12,12,95]
                     })

But how to go on?
The task in plain English is:

The found structure looks like this:

11 -> 1582
|-12 -> 1583
  |-13 -> 2023
  |-14 -> 2025
  |-15 -> 2030

Solution

  • Essentially, what you want is recursive query. If you want pure polars, then I'm afraid you need to join iteratively.

    The idea is relatively simple. First, you create starting dataframe with filter(). Then, during each iteration you use join() to create the dataframe which contains next level in the hierarchy. While iterating you save these dataframes into list. Iterations stop when you get to the end of the hierarchy. At the end you use .concat() to combine all the dataframes together.

    df = (
        prod
        .filter(pl.col.code.is_in(['1008', '1582']))
        .with_columns(group_type = pl.col.code)
    )
    dfs = [df]
    
    while True:
        df = df.join(prre, left_on="id", right_on="master_product_id")
        if df.height == 0:
            break
        df = (
            prod
            .join(df, left_on="id", right_on="detail_product_id")
            .select("id", "code", "group_type")
        )
    
        dfs.append(df)
    
    pl.concat(dfs).drop("id")
    
    ┌──────┬────────────┐
    │ code ┆ group_type │
    │ ---  ┆ ---        │
    │ str  ┆ str        │
    ╞══════╪════════════╡
    │ 1008 ┆ 1008       │
    │ 1582 ┆ 1582       │
    │ 1583 ┆ 1582       │
    │ 2023 ┆ 1582       │
    │ 2025 ┆ 1582       │
    │ 2030 ┆ 1582       │
    └──────┴────────────┘
    

    But you can also use DuckDB integration with polars and recursive cte

    import duckdb
    
    duckdb.sql("""
        with recursive cte as (
            select
                d.code,
                d.code as group_type,
                d.id,
                1 as depth
            from prod as d
            where
                d.code in ('1008', '1582')
    
            union all
    
            select
                d.code,
                c.group_type,
                d.id,
                c.depth + 1
            from cte as c
                inner join prre as p on
                    p.master_product_id = c.id
                inner join prod as d on
                    d.id = p.detail_product_id
        )
        select
            code,
            group_type,
            depth
        from cte as c
        order by
            depth asc
    """)
    
    ┌─────────┬────────────┬───────┐
    │  code   │ group_type │ depth │
    │ varchar │  varchar   │ int32 │
    ├─────────┼────────────┼───────┤
    │ 1008    │ 1008       │     1 │
    │ 1582    │ 1582       │     1 │
    │ 1583    │ 1582       │     2 │
    │ 2023    │ 1582       │     3 │
    │ 2025    │ 1582       │     3 │
    │ 2030    │ 1582       │     3 │
    └─────────┴────────────┴───────┘