sqlsql-serverdatabase-designdatabase-normalization

Normalize Table2 where one person can only have one device in Table1, but there are many different colors for the same device in Table2


Table1 has roughly 50k rows, like this:

Id Name Dob Device
1 Bill 09/30/1987 bira1
2 Kayle 06/12/1967 kale2
3 Gill 05/23/1976 bira1
4 Kane 01/30/1963 bira3
5 Hena 02/12/1978 kale1

Table2 has 200 rows:

Type Device color price
Apple bira1 Grey 100$
Apple bira1 Green 120$
Samsung kale2 silver 200$
Samsung kale1 blue 10$
Samsung kale2 black 190$

Table2 doesn't have any Id or a primary key. When I join the tables using column Device, it usually doesn't return the right information, since one person can only have one device in Table1, but there are many different colors for the same device and it is difficult to find which colors the person has.

How can I normalize Table2?

It usually takes a long time to run the query.


Solution

  • You could add an id int identity primary key to your Device table. Then add a device_id int foreign key references Device(id) to your People table.

    To populate it:

    Then add a not null constraint to device_id, and drop your People.Device column (you now have normalized it).

    Technically:

    -- Devices having only one variant are attributed to people:
    update p
    set p.DeviceId = d.Id
    from People p
    join (select Device, min(Id) Id from Device group by Device having count(*) = 1) d
    on d.Device = p.Device;
    -- Devices having many variants, with no possibility to know which one each user has, generate an additional "generic" variant that gets attributed to users:
    insert into Device (Type, Device)
      select Type, Device
      from Device
      group by Type, Device
      having count(*) > 1
    ;
    update p
    set p.DeviceId = d.Id
    from People p
    join (select Device, Id from Device where color is null and price is null) d
    on d.Device = p.Device
    where p.DeviceId is null;
    -- People with an unknown device get an ad-hoc one generated:
    insert into Device (Device)
      select distinct Device from People where DeviceId is null;
    update People
    set DeviceId = (select Id from Device where Device.Device = People.Device)
    where DeviceId is null;
    -- Now get rid of the redundant column, and enforce the constraint:
    alter table People drop column Device;
    alter table People alter column DeviceId int not null;
    

    Which ends up with:

    Type Device color price Id
    Apple bira1 Grey 100$ 1
    Apple bira1 Green 120$ 2
    Samsung kale2 silver 200$ 3
    Samsung kale1 blue 10$ 4
    Samsung kale2 black 190$ 5
    Apple bira1 null null 6
    Samsung kale2 null null 7
    null bira3 null null 8
    Id Name Dob DeviceId
    1 Bill 1987-09-30 6
    2 Kayle 1967-06-12 7
    3 Gill 1976-05-23 6
    4 Kane 1963-01-30 8
    5 Hena 1978-02-12 4

    (see it running on your dataset)