djangopostgresqlforeign-keysmany-to-manygeneric-foreign-key

How to tag every table in my database? Generic Foreign Keys, array column, or other method


I'm currently designing a PostgreSQL database where I need to implement a tagging functionality. Essentially, any item from many tables (7+) in my database can be tagged via key-value tags, very much like Azure tags.

I'm considering different approaches for implementing this feature and I'm confused about the best approach, since this must be a common problem in many data models:

  1. Using Django's Generic Foreign Keys.

    One approach I'm considering is leveraging Django's Generic Foreign Keys functionality to create a versatile "Tags" table that can reference any other table in the database. This allows flexibility and avoids complex logic. However, there might be drawbacks like performance implications or the database not making sense without Django's ORM.

  2. Manual implementation in a single table, without Generic Foreign Keys.

    Another option is to implement the "Tags" table manually without relying on Django's Generic Foreign Keys. I would create a separate table that can reference any other table in the database, but without the use of formal foreign keys. This would require to either store some logic in the database or the back-end, being more complex to maintain.

  3. Separate cross-reference tags table for each table.

    Alternatively, I could create a separate table (i.e. car_tags, house_tags, person_tags) for each table in my database that requires tagging. This approach would ensure clear separation and potentially better performance, but it might result in redundancy and increased complexity in queries and maintenance.

  4. Adding a tags TEXT[] array column in every table.

    Introducing a "tags" array column in each table that requires tagging, rather than maintaining a general "Tags" table. This simplifies queries but may limit flexibility and scalability.

Which approach is the most suitable for this use-case? I'm more concerned about the maintainability and scalability of the database, rather than performance. If these are not appropriate, you can suggest any best practices for implementing a flexible tagging system in PostgreSQL.


Solution

    1. Django's GenericForeignKey are just weak, unenforced/unguarded references from db's perspective. The Tags table would have two additional columns, one identifying what type of thing it's tagging, the other one holding a unique identifier of that thing.

      It shouldn't be too hard to make sense out of that, with or without Django ORM. It's actually how PostgreSQL manages some things internally: pg_class.relkind tells you what type of thing it's listing, pg_class.oid identifies the specific thing. When you look things up in system views or information_schema, it'll filter down to the specific type of thing listed in pg_class and join to it on oid.

    2. You can easily implement this on your own a similar way Django does:

      create table tag (
          id bigint generated by default as identity primary key,
          name text);
      create table tag_anything (
          tag_id bigint references tag(id),
          other_thing_type text,
          other_thing_id text);
      create view cat_tagged as 
          select cat.*,array_agg(tag.name) as tags
          from cat 
          join tag_anything ta 
            on ta.other_thing_type='cat'
           and ta.other_thing_id=cat.id
          join tag
            on tag.id=ta.tag_id;
          group by cat.id;
      

      One problem is setting up the views and joins for those things, another that each thing can use a different, possibly multi-column primary key, which you need to map to the single generic foreign key reference. Some systems use uuid as the pk for everything, making each identifier of each thing, globally unique and uniform in type, which simplifies this.
      You also need to implement your own mechanics to handle ON UPDATE, ON DELETE behaviour as well as the link validation to avoid consistency loss.

    3. That's the normalized approach. If you think about what Django does with the GenericForeignKey and what you may try to reimplement, this is just a single table-per-thing more than that: tag_anything from the example above becomes tag_cat, loses the thing_type column and gets a proper foreign key reference, with adequate type.

      You get the on update, on delete and consistency checks for free.

    4. Adding a text[] or a jsonb everywhere sounds like a nightmare. You'd duplicate tags all over the place and weigh every tagged thing down. Querying based on those would have to hit every tagged table in the db.