Select substring of multiple rows with similar values

I am trying to select same substring of multiple rows without specifying the substring first.
Basically find the common part in a string. The common parts in a string range from 2 to 5 words.

Here is what I am trying to achieve:

The goal is to find the longest/last common denominator of multiple rows, creating a group / a set. The substring at the end of a string that is unique to a single row in a set should be trimmed.

Rows with that have same substring from left are considered part of a set.

"Item A 1" and "Item A 2" are part of "Item A" set.

"Item B 1" and "Item B 2" are part of "Item B" set.

The sets can include multiple parts from string.

"Item A A 1" and "Item A A 2" are part of "Item A A" set.

"Item A B 1" and "Item A B 2" are part of "Item A B" set.

Also value of the row might be just the name of a set.

"Item A" and "Item A 1" are part of "Item A" set.

"Item A B" and "Item A B 1" are part of "Item A B" set.

id	item	set
1	DEP Item 1	DEP Item
2	DEP Item 10	DEP Item
3	DEP Item 11	DEP Item
4	DEP Item 24	DEP Item
5	DEP G1 Item B 1	DEP G1 Item B
6	DEP G1 Item B 10	DEP G1 Item B
7	DEP G1 Item B 11	DEP G1 Item B
8	DEP G1 Item B 24	DEP G1 Item B
9	DEP A Item B 1 1 A	DEP A Item B 1
10	DEP A Item B 1 10	DEP A Item B 1
11	DEP A Item B 1 11 B	DEP A Item B 1
12	DEP A Item B 1 24 A B C	DEP A Item B 1

Here is sample data and a query I've been trying, which sort of gives correct results, but not in the required way.

CREATE TABLE #temp (
    id INT,
    item NVARCHAR(50)
);


INSERT INTO #temp (id, item) VALUES
(1,'DEP Item 1'),
(2,'DEP Item 10'),
(3,'DEP Item 11'),
(4,'DEP Item 24'),
(5,'DEP G1 Item B 1'),
(6,'DEP G1 Item B 10'),
(7,'DEP G1 Item B 11'),
(8,'DEP G1 Item B 24'),
(9,'DEP A Item B 1 1 A'),
(10,'DEP A Item B 1 10'),
(11,'DEP A Item B 1 11 B'),
(12,'DEP A Item B 1 24 A B C')

select *,
CASE WHEN LEN(item)-LEN(REPLACE(item, ' ', '')) < 1 THEN item
    ELSE LEFT(item, CHARINDEX(' ', item,
        CHARINDEX(' ', item)+1))
    end,
CASE WHEN LEN(item)-LEN(REPLACE(item, ' ', '')) < 2 THEN item
    ELSE LEFT(item, CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item)+1)+1))
    end,
CASE WHEN LEN(item)-LEN(REPLACE(item, ' ', '')) < 4 THEN item
    ELSE LEFT(item, CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item)+1)+1)+1))
    end,
CASE WHEN LEN(item)-LEN(REPLACE(item, ' ', '')) < 5 THEN item
    ELSE LEFT(item, CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item,
        CHARINDEX(' ', item)+1)+1)+1)+1))
    end
from #temp

Solution

First, we make up all possible combinations of word sets for each id(item) where order of words is remains - this is path's.
6, DEP G1 Item B 1 ->(DEP), (DEP,G1), (DEP,G1,Item), (DEP,G1,Item,B), (DEP,G1, Item,B,1)
9, DEP G1 Item B 11->(DEP), (DEP,G1), (DEP,G1,Item), (DEP,G1,Item,B), (DEP,G1, Item,B,11)

Then select rows with common path's - thru comparing path's for every item with all item's path's - select rows with inner self join on t1.path=t2.path and t1.id<>t2.id.

And last - take greatest path's (order by count of words in path desc) for items.

See example

with r as(
  select id,1 n,item
     ,case when charindex(' ',item)>0 then  
        trim(substring(item,charindex(' ',item)+1,100))
      else ''
      end rest
     ,cast(case when charindex(' ',item)>0 then  
        trim(substring(item,1,charindex(' ',item)-1))
      else trim(item)
      end  as varchar(100)) path
from #temp
  union all
  select id,n+1 n,item
     ,case when charindex(' ',rest)>0 then  
        trim(substring(rest,charindex(' ',rest)+1,100))
      else ''
      end rest
     ,cast(concat(path,','
       ,case when charindex(' ',rest)>0 then  
        trim(substring(rest,1,charindex(' ',rest)-1))
        else trim(rest)
        end) as varchar(100)) path
from r where len(rest)>0
)
select s.path,n,t.id,t.item
from #temp t 
left join (
   select  t1.id,t1.path,t1.n
     ,row_number()over(partition by t1.id order by t1.n desc) rn
   from r t1
   inner join r t2 on t1.path=t2.path and t1.id<>t2.id
  ) s on s.rn=1 and s.id=t.id
order by n,path,t.id

output is

path (`set`)	n	id	item
DEP	1	1	DEP
DEP,Item	2	2	DEP Item 1
DEP,Item	2	3	DEP Item 10
DEP,Item	2	5	DEP Item 24
DEP,G1,Item,B	4	6	DEP G1 Item B 1
DEP,G1,Item,B	4	8	DEP G1 Item B 11
DEP,G1,Item,B	4	9	DEP G1 Item B 24
DEP,A,Item,B,1	5	11	DEP A Item B 1 10
DEP,A,Item,B,1	5	13	DEP A Item B 1 24 A B C
DEP,A,Item,B,1,1,A	7	10	DEP A Item B 1 1 A
DEP,A,Item,B,1,1,A	7	14	DEP A Item B 1 1 A B C
null	null	15	Hello world

There item id=15 not have common words with other rows.

on test data

CREATE TABLE #temp (id INT, item NVARCHAR(50));
INSERT INTO #temp (id, item) VALUES
(1,'DEP'),
(2,'DEP     Item 1'),
(3,'DEP Item 10'),
--(4,'DEP Item 11'),
(5,'DEP Item 24'),
(6,'DEP G1 Item B 1'),
--(7,'DEP G1 Item B 10'),
(8,'DEP G1 Item B 11'),
(9,'DEP G1 Item B 24'),
(10,'DEP A Item B 1 1 A'),
(11,'DEP A Item B 1 10'),
--(12,'DEP A Item B 1 11 B'),
(13,'DEP A Item B 1 24 A B C'),
(14,'DEP A Item B 1 1 A B C'),
(15,'Hello world')

;

Fiddle