Table1 has roughly 50k rows, like this:
| Id | Name | Dob | Device |
|---|---|---|---|
| 1 | Bill | 09/30/1987 | bira1 |
| 2 | Kayle | 06/12/1967 | kale2 |
| 3 | Gill | 05/23/1976 | bira1 |
| 4 | Kane | 01/30/1963 | bira3 |
| 5 | Hena | 02/12/1978 | kale1 |
Table2 has 200 rows:
| Type | Device | color | price |
|---|---|---|---|
| Apple | bira1 | Grey | 100$ |
| Apple | bira1 | Green | 120$ |
| Samsung | kale2 | silver | 200$ |
| Samsung | kale1 | blue | 10$ |
| Samsung | kale2 | black | 190$ |
Table2 doesn't have any Id or a primary key. When I join the tables using column Device, it usually doesn't return the right information, since one person can only have one device in Table1, but there are many different colors for the same device and it is difficult to find which colors the person has.
How can I normalize Table2?
It usually takes a long time to run the query.
You could add an id int identity primary key to your Device table. Then add a device_id int foreign key references Device(id) to your People table.
To populate it:
Device that exists only once in Device: use its generated id to fill in the device_id of all corresponding entries in PeopleDevice to represent a generic version of the device (without price nor color), and use its id as the device_id (until the people tell you which model exactly they bought)Then add a not null constraint to device_id, and drop your People.Device column (you now have normalized it).
Technically:
-- Devices having only one variant are attributed to people:
update p
set p.DeviceId = d.Id
from People p
join (select Device, min(Id) Id from Device group by Device having count(*) = 1) d
on d.Device = p.Device;
-- Devices having many variants, with no possibility to know which one each user has, generate an additional "generic" variant that gets attributed to users:
insert into Device (Type, Device)
select Type, Device
from Device
group by Type, Device
having count(*) > 1
;
update p
set p.DeviceId = d.Id
from People p
join (select Device, Id from Device where color is null and price is null) d
on d.Device = p.Device
where p.DeviceId is null;
-- People with an unknown device get an ad-hoc one generated:
insert into Device (Device)
select distinct Device from People where DeviceId is null;
update People
set DeviceId = (select Id from Device where Device.Device = People.Device)
where DeviceId is null;
-- Now get rid of the redundant column, and enforce the constraint:
alter table People drop column Device;
alter table People alter column DeviceId int not null;
Which ends up with:
| Type | Device | color | price | Id |
|---|---|---|---|---|
| Apple | bira1 | Grey | 100$ | 1 |
| Apple | bira1 | Green | 120$ | 2 |
| Samsung | kale2 | silver | 200$ | 3 |
| Samsung | kale1 | blue | 10$ | 4 |
| Samsung | kale2 | black | 190$ | 5 |
| Apple | bira1 | null | null | 6 |
| Samsung | kale2 | null | null | 7 |
| null | bira3 | null | null | 8 |
| Id | Name | Dob | DeviceId |
|---|---|---|---|
| 1 | Bill | 1987-09-30 | 6 |
| 2 | Kayle | 1967-06-12 | 7 |
| 3 | Gill | 1976-05-23 | 6 |
| 4 | Kane | 1963-01-30 | 8 |
| 5 | Hena | 1978-02-12 | 4 |
(see it running on your dataset)