djangodjango-modelslegacy-database

Combine two unrelated tables/models with same primary key in Django


I have two unrelated tables with the same primary key.

ip            mac
11.11.11.11   48-C0-09-1F-9B-54
33.33.33.33   4E-10-A3-BC-B8-9D
44.44.44.44   CD-00-60-08-56-2A
55.55.55.55   23-CE-D3-B1-39-A6

ip            type     owner
22.22.22.22   laptop   John Doe
33.33.33.33   server   XYZ Department
44.44.44.44   VM       Mary Smith
66.66.66.66   printer  ZWV Department

The first table is automatically refreshed every minute. I can't change the database structure or the script that populates it.

Both tables have ip as PRIMARY KEY.

In a view, I would like to display a table like this:

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

How can I model this? Should I make one of the two primary keys a foreign key into the other one?

Once the code is in operation, there will be lots of data, so I want to make sure it's fast enough.

What is the fastest way to retrieve the data?


Update:

I tried using OneToOneField for the second table.

This helps me get records that are in both tables, and the records for unauthorized devices (IPs missing in second table):

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

but I can't get the devices that are down (IP's missing in first table):

22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down

I asked for help here, but it seems it can't be done with OneToOneField


Solution

  • General idea

    You can use qs.union:

    from django.db import models
    from django.db.models import F, OuterRef, Subquery, Value
    from django.db.models.functions import Coalesce
    
    # OperationalDevice fields: ip, mac
    # AllowedDevice fields: ip, type, owner
    
    USE_EMPTY_STR_AS_DEFAULT = True
    
    null_char_field = models.CharField(null=True)
    if USE_EMPTY_STR_AS_DEFAULT:
        default_value = ''
    else:
        default_value = None
    
    # By default Expressions treat strings as "field_name" so if you want to use
    # empty string as a second argument for Coalesce, then you should wrap it in
    # `Value()`.
    # `None` can be used there without wrapping in `Value()`, but in
    # `.annotate(type=NoneValue)` it still should be wrapped, so it's easier to
    # just "always wrap".
    default_value = Value(default_value, output_field=null_char_field)
    
    operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef('ip'))
    
    
    qs1 = (
        AllowedDevice.objects
        .all()
        .annotate(
            mac=Coalesce(
                Subquery(operational_devices_subquery.values('mac')[:1]),
                default_value,
                output_field=null_char_field,
            ),
        )
    )
    
    qs2 = (
        OperationalDevice.objects
        .exclude(
            ip__in=qs1.values('ip'),
        )
        .annotate(
            type=default_value,
            owner=default_value,
        )
    )
    
    final_qs = qs1.union(qs2)
    
    

    Generic approach for multiple fields

    A more complex but "universal" approach may use Model._meta.get_fields(). It will be easier to use for cases where "second" model have more that 1 extra field (not only ip,mac). Example code (not tested, but gives general impression):

    # One more import:
    from django.db.models.fields import NOT_PROVIDED
    
    common_field_name = 'ip'
    
    # OperationalDevice fields: ip, mac, some_more_fields ...
    # AllowedDevice fields: ip, type, owner
    
    operational_device_fields = OperationalDevice._meta.get_fields()
    operational_device_fields_names = {_f.name for _f in operational_device_fields}  # or set((_f.name for ...))
    
    allowed_device_fields = AllowedDevice._meta.get_fields()
    allowed_device_fields_names = {_f.name for _f in allowed_device_fields}  # or set((_f.name for ...))
    
    operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef(common_field_name))
    
    left_joined_qs = (  # "Kind-of". Assuming AllowedDevice to be "left" and OperationalDevice to be "right"
        AllowedDevice.objects
        .all()
        .annotate(
            **{
                _f.name: Coalesce(
                    Subquery(operational_devices_subquery.values(_f.name)[1]),
                    Value(_f.get_default()),  # Use defaults from model definition
                    output_field=_f,
                )
                for _f in operational_device_fields
                if _f.name not in allowed_device_fields_names
                # NOTE: if fields other than `ip` "overlap", then you might consider
                # changing logic here. Current implementation keeps fields from the
                # AllowedDevice
            }
            # Unpacked dict is partially equivalent to this:
            # mac=Coalesce(
            #     Subquery(operational_devices_subquery.values('mac')[:1]),
            #     default_for_mac_eg_fallback_text_value,
            #     output_field=null_char_field,
            # ),
            # other_field = Coalesce(...),
            # ...
        )
    )
    
    lonely_right_rows_qs = (
        OperationalDevice.objects
        .exclude(
            ip__in=AllowedDevice.objects.all().values(common_field_name),
        )
        .annotate(
            **{
                _f.name: Value(_f.get_default(), output_field=_f),  # Use defaults from model definition
                for _f in allowed_device_fields
                if _f.name not in operational_device_fields_names
                # NOTE: See previous NOTE
            }
        )
    )
    
    final_qs = left_joined_qs.union(lonely_right_rows_qs)
    
    

    Using OneToOneField for "better" SQL

    Theoretically you can use device_info = models.OneToOneField(OperationalDevice, db_column='ip', primary_key=True, related_name='status_info'): in AllowedDevice. In this case your first QS may be defined without use of Subquery:

    from django.db.models import F
    
    # Now 'ip' is not in field names ('device_info' is there), so add it:
    allowed_device_fields_names.add(common_field_name)
    
    # NOTE: I think this approach will result in a more compact SQL query without 
    # multiple `(SELECT "some_field" FROM device_info_table ... ) as "some-field"`.
    # This also might result in better query performance.
    honest_join_qs = (
        AllowedDevice.objects
        .all()
        .annotate(
            **{
                _f.name: F(f'device_info__{_f.name}')
                for _f in operational_device_fields
                if _f.name not in allowed_device_fields_names
            }
        )
    )
    
    final_qs = honest_join_qs.union(lonely_right_rows_qs)
    # or:
    # final_qs = honest_join_qs.union(
    #     OperationalDevice.objects.filter(status_info__isnull=True).annotate(**missing_fields_annotation)
    # )
    # I'm not sure which approach is better performance-wise...
    # Commented one will use something like:
    # `SELECT ... FROM "device_info_table" LEFT OUTER JOIN "status_info_table" ON ("device_info_table"."ip" = "status_info_table"."ip") WHERE "status_info_table"."ip" IS NULL
    #
    # So it might be a little better than first with `union(QS.exclude(ip__in=honest_join_qs.values('ip'))`.
    # Because later uses SQL like this:
    # `SELECT ... FROM "device_info_table" WHERE NOT ip IN (SELECT ip FROM "status_info_table")`
    #
    # But it's better to measure timings of both approaches to be sure.
    # @GrannyAching, can you compare them and tell in the comments which one is better ?
    

    P.S. To automate models definition you can use manage.py inspectdb

    P.P.S. Maybe multi-table inheritance with custom OneToOneField(..., parent_link=True) may be more helpful for you than using union.