I'm creating a database to store the events of mobile apps recovered from multiple sources. Problem is that rows from the event table don't have much meaning to the user as it's mostly a succession of integers. Forcing them to make multiple joins or multiple queries.
CREATE TABLE source (
id serial PRIMARY KEY,
value string NOT NULL
);
CREATE TABLE application (
id serial PRIMARY KEY,
value string NOT NULL
);
CREATE TABLE platform (
id serial PRIMARY KEY,
value string NOT NULL
);
CREATE TABLE country (
id serial PRIMARY KEY,
value string NOT NULL
);
CREATE TABLE event (
id serial PRIMARY KEY,
source_id integer REFERENCES source(id),
application_id integer REFERENCES application(id),
platform_id integer REFERENCES platform(id),
country_id integer REFERENCES country(id),
...
updated_at date NOT NULL,
value decimal(100, 2) NOT NULL
);
I thought of directly using the value of the "secondary" tables as a primary key (as it's unique and not null) that I would reference in the event table. It would look like that:
CREATE TABLE source (
value string PRIMARY KEY
);
CREATE TABLE application (
value string PRIMARY KEY
);
CREATE TABLE platform (
value string PRIMARY KEY
);
CREATE TABLE country (
value string PRIMARY KEY
);
CREATE TABLE event (
id serial PRIMARY KEY,
source string REFERENCES source(value),
application string REFERENCES application(value),
platform string REFERENCES platform(value),
country string REFERENCES country(value),
...
updated_at date NOT NULL,
value decimal(100, 2) NOT NULL
);
I think it might also be good this way as I don't really see an added value at using a surrogate key in this situation. Also prevents me from using views which might have slower performances as it executes a query every time I use the view in a query.
What do you think of this option ?
"real" systems usually use surrogate keys. There are multiple reasons why:
There is nothing per se wrong with using strings. But in practice, they are not used for this purpose.