I want to know if there is a way we can do the below task in SQL. Can we do this with match_ recognize?
Above is the input table. I want to group the role for a specific region where No. of employees is less than 6. So, In Africa, we have to combine Analyst, S.Analyst and manager (2+3+9) to overcome the threshold value of 6. Partner and S. Manager must be combined in India (2+7). S. Analyst can be left as it is. We have to drop the USA or add some flag that we can identify using a where clause. So the output should look like below.
So, I think I've got it now. As mentioned in my original answer, my new approach is to get all group combinations. I am working with IDs, and as your table doesn't seem to have them, my first step is to genrate them. In the same step I am numbering the rows per region, so as to start building combinations with only one row per region in the recursive part to save time. (I will get duplicate combinations, like (1,2,3) and (1,3,2), but at least avoid the additional (2,1,3), (2,3,1), (3,1,2), (3,2,1)).
I am using an ID string to get the combinations. Once a group is complete with at least 6 employees, I start a new group after a semicolon. So, with IDs 1, 2 and 3, I might end up with groups '1,2,3' (all IDs in one group), '1,2;3' (1 and 2 in one group, 3 in another) and many more, because with the recursive query I also generate incomplete groups like '1,2'. I will later find the complete groups by looking at the digits: '1,2,3' and '1,2;3' have three digits, while '1,2' only has two, so it cannot be complete. As a separator I am not using the comma, but a #, which helped me a bit when looking at the generated rows, as ',' and ';' look very similar. I also not only separate the IDs with #, but also have an # at the beginning and the end. This helps with finding an ID in the string. If I look for 1, I look for '#1#' actually, because I don't want to find the 1 in '12' or '21'. And I will find it in a string like '#1#2#;#3#', because the 1 is surrounded by #.
Once I have all combinations, I detect the best combination per region with a rather complicated ranking. At last I join the original rows to the groups and aggregate.
with
data (id, region, role, employees, rn) as
(
select
row_number() over (order by region, role, employees),
region, role, employees,
row_number() over (partition by region order by employees, role)
from mytable
),
combinations (region, ids, employees) as
(
select region, cast('#' || id || '#' as varchar2(400 byte)), employees
from data
where rn = 1
union all
select
g.region,
case when g.employees >= 6
then g.ids || ';#' || d.id || '#'
else g.ids || d.id || '#'
end,
case when g.employees >= 6
then d.employees
else d.employees + g.employees
end
from combinations g
join data d on d.region = g.region
and g.ids not like '%#' || d.id || '#%'
),
best_combinations (region, ids, employees) as
(
select region, ids, employees
from combinations
order by rank() over (partition by region -- per region
order by regexp_count(ids, '\d') desc, -- all IDs in combinations
case when employees >= 6 then 1 else 2 end, -- no incomplete combinations left if possible
regexp_count(ids, ';') desc, -- as many combinations as possible
ids) -- in order to pick only one row per region in case of ties
fetch first row with ties
),
final_groups(region, ids, rest) as
(
select region, regexp_substr(ids, '^[^;]+'), regexp_substr(ids, ';(.*$)', 1, 1, 'i', 1)
from best_combinations
union all
select region, regexp_substr(rest, '^[^;]+'), regexp_substr(rest, ';(.*$)', 1, 1, 'i', 1)
from final_groups
where rest is not null
)
select
bgs.region,
listagg(d.role, ', ') within group (order by d.role) as roles,
sum(d.employees) as employees
from data d
join final_groups bgs on bgs.ids like '%#' || d.id || '#%'
group by bgs.region, bgs.ids
order by bgs.region, bgs.ids;