djangopostgresqlprimary-keyprimary-key-design

Is it wise to construct a custom string (combination of entity, year and alphanumeric) to have as a primary key for a postgres database in Django?


I'm working on creating a Django backend for an application and I want the primary key of our entities to fulfil certain criteria. First criteria is that it is always 10 characters long so that it is easy to read and share verbally if needed. Second criteria is that it always follows the same format:-

  1. First 2 characters => Code for the entity (for example "US" is for User records, "TO" is for topics, "BO" for books etc.) so that anyone can easily say what a certain ID is for
  2. Second 2 characters => Year of the date of creation ('24', '23 etc.)
  3. Remaining 6 characters are randomly generated alphanumeric characters

I've written the following code to generate the id:-

def auto_increment_id():

alphanumeric_chars = string.ascii_letters + string.digits
last_chars = ''.join(random.choice(alphanumeric_chars) for _ in range(6))
user_id = 'US' + str(datetime.date.today().year)[2:] + str(last_chars)
return user_id

And I have it setup in the user model like this

class User(models.Model):
id = models.CharField(max_length=10, primary_key=True,
                      default=auto_increment_id, editable=False)

My question is, will this lead to performance issues or other issues as we scale up? Is this a wise setup?


Solution

  • There are issues with this strategy besides performance.

    You can only have 36^6 possible random strings of 6 alphanumeric characters. This is actually close to the same number of positive values in a 4 byte integer (PostgreSQL does not support an unsigned integer data type). So using an integer instead of a string uses 1/3 less space.

    What happens if your random six-digit string generates the same string that is already in use by another row? It would cause a duplicate key conflict, and your client would need to re-try the INSERT (possibly more than once). Obviously this would be rare, because the chance of generating the same string is low. But the client still must handle this potential error, because it's not guaranteed to never happen. So you need to write more code for very rare cases. Don't forget to include this case in your automated tests.

    Another approach would be to define a primary key constraint over three columns, however Django currently only supports single-column primary key.

    But the two issues I noted above would be solved easily by using the conventional IDENTITY primary key — a single auto-incrementing integer column.

    I think you're trying to solve a problem for which an easy solution already exists.