sqlpostgresqldatabase-designmany-to-manyblogs

Insert record into many-to-many relationship


I want to create a database for my blog website where I have many posts and categories. A post can have many categories and a category can have many posts. I couldn't figure out how to implement it in PostgreSQL.

This is the category table:

CREATE TABLE category (
   category_name VARCHAR(255) PRIMARY KEY
);

And this is the Post table:

CREATE TABLE post (
   post_id INTEGER NOT NULL PRIMARY KEY,
   title VARCHAR(255),
   category_name VARCHAR(255) REFERENCES category (category_name)
                 ON UPDATE CASCADE ON DELETE CASCADE,
   date DATE
);

I find it difficult to INSERT a record.

The following doesn't work:

INSERT INTO post(post_id, title, category_name, date)
VALUES (1, 'How to create a website',['technology','website'], '2025-06-04');

I could've inserted some data into the category table, but only 1 value works at a time. How do I add multiple values?


Solution

  • With arrays

    PostgreSQL supports arrays of values. This will be the simplest way to implement your needs:
    You'll just have to declare your column as category_name VARCHAR(255)[],
    and insert as ARRAY['technology','website'] (instead of the ['technology','website'] you put in your question).

    However, even if PostgreSQL provides a bunch of functions that make arrays easier to handle, quicker, and cleaner than comma-separating your values, this is still a shaky model that won't be helped by the RDBMS, and especially, as of 2025, you can't get them be FOREIGN KEYS.
    Maintaining integrity will rely on defining some triggers on both tables (INSERT OR UPDATE OF category_name ON post, UPDATE OR DELETE ON category). Hint: using serials instead of the name could remove the need for UPDATE ON category, and make it lighter on storage, on the other hand SELECTing the categories will then of course require one additional indirection.

    You can get a glimpse at it in a small fiddle to play with.

    With an n..n link table

    Then of course you've got the all-terrain join table, that you could call post_category, to remove that multivalued column and at least satisfy 1NF.

    This will require a two-steps INSERT, or using PostgreSQL's ability to include INSERT … RETURNING in CTES:

    WITH
    i AS
    (
        INSERT INTO post(post_id, title, date)
        VALUES (1, 'How to create a website', '2025-06-04')
        RETURNING post_id
    )
    INSERT INTO post_category
    SELECT post_id, category_name
    FROM i, category
    WHERE category_name in ('technology','website');
    

    This is shown in its own fiddle.

    Of course with CTEs you can add a lot of other things, for example:

    Handling SELECTs in an 1NF model

    You perhaps already noticed that having a clean model forces to think of a clean implementation, even for SELECTs.

    For example, if you do not pay attention, with a blog containing 2 posts (post 1 with categories A and B, and post 2 with categories A, B and C), a summary SELECT 'Over the whole blog, you posted '||COUNT(post.post_id)||' posts over '||COUNT(category)||' different categories' FROM post LEFT JOIN post_category USING (post_id);, would return "you posted 5 posts over 5 categories" instead of "2 posts over 3 categories".

    DISTINCT is a quick-and-really-dirty way (that will not work with joins of more that two 1-to-n or n-to-n tables),
    so I'd advize you to always think in sets of rows that each represent 1 physical entity (while DISTINCT is just hiding that you got 2 or 3 rows to represent each "physical unit" which is a post).

    Better strategies would be based on grouped subqueries: