I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):
create table ancestry (
ancestor integer,
descendant integer,
distance integer
);
I have another table that contains the organizations that each user is allowed to access:
create table accessible (
user integer,
organization integer
);
The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).
For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.
I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.
with
hit (id, count) as (
select
ancestry.ancestor
,count(ancestry.descendant)
from
accessible
inner join ancestry
on accessible.organization = ancestry.descendant
where
accessible.user = @user_id
group by
ancestry.ancestor
)
select
ancestry.descendant as lca
from
hit
inner join ancestry
on ancestry.descendant = hit.id
and ancestry.ancestor = @company_id
order by
hit.count desc
,ancestry.distance desc
limit 1
;
The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.
A
|
B
/ \
C D
Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:
Node Count
A 2
B 2
C 1
D 1
The main query adds the distance:
Node Count Distance
A 2 0
B 2 1
C 1 2
D 1 2
The main query then orders the results by descending count and distance
Node Count Distance
B 2 1
A 2 0
C 1 2
D 1 2
The LCA is the first item in the list.