sql sql-server database-design database-normalization

Normalize Table2 where one person can only have one device in Table1, but there are many different colors for the same device in Table2

Table1 has roughly 50k rows, like this:

Id	Name	Dob	Device
1	Bill	09/30/1987	bira1
2	Kayle	06/12/1967	kale2
3	Gill	05/23/1976	bira1
4	Kane	01/30/1963	bira3
5	Hena	02/12/1978	kale1

Table2 has 200 rows:

Type	Device	color	price
Apple	bira1	Grey	100$
Apple	bira1	Green	120$
Samsung	kale2	silver	200$
Samsung	kale1	blue	10$
Samsung	kale2	black	190$

Table2 doesn't have any Id or a primary key. When I join the tables using column Device, it usually doesn't return the right information, since one person can only have one device in Table1, but there are many different colors for the same device and it is difficult to find which colors the person has.

How can I normalize Table2?

It usually takes a long time to run the query.

Solution

You could add an id int identity primary key to your Device table. Then add a device_id int foreign key references Device(id) to your People table.

To populate it:

for each Device that exists only once in Device: use its generated id to fill in the device_id of all corresponding entries in People
for the others: add a row in Device to represent a generic version of the device (without price nor color), and use its id as the device_id (until the people tell you which model exactly they bought)

Then add a not null constraint to device_id, and drop your People.Device column (you now have normalized it).

Technically:

-- Devices having only one variant are attributed to people:
update p
set p.DeviceId = d.Id
from People p
join (select Device, min(Id) Id from Device group by Device having count(*) = 1) d
on d.Device = p.Device;
-- Devices having many variants, with no possibility to know which one each user has, generate an additional "generic" variant that gets attributed to users:
insert into Device (Type, Device)
  select Type, Device
  from Device
  group by Type, Device
  having count(*) > 1
;
update p
set p.DeviceId = d.Id
from People p
join (select Device, Id from Device where color is null and price is null) d
on d.Device = p.Device
where p.DeviceId is null;
-- People with an unknown device get an ad-hoc one generated:
insert into Device (Device)
  select distinct Device from People where DeviceId is null;
update People
set DeviceId = (select Id from Device where Device.Device = People.Device)
where DeviceId is null;
-- Now get rid of the redundant column, and enforce the constraint:
alter table People drop column Device;
alter table People alter column DeviceId int not null;

Which ends up with:

Type	Device	color	price	Id
Apple	bira1	Grey	100$	1
Apple	bira1	Green	120$	2
Samsung	kale2	silver	200$	3
Samsung	kale1	blue	10$	4
Samsung	kale2	black	190$	5
Apple	bira1	null	null	6
Samsung	kale2	null	null	7
null	bira3	null	null	8

Id	Name	Dob	DeviceId
1	Bill	1987-09-30	6
2	Kayle	1967-06-12	7
3	Gill	1976-05-23	6
4	Kane	1963-01-30	8
5	Hena	1978-02-12	4

(see it running on your dataset)