mysqlchardecimalzerofill

MySQL: using char(n) vs decimal(n) with zero fill


I was asked to use a database in which most of the primary keys, and other fields as well, uses char(n) to store numeric values with padding, for example:

product_id: char(8) [00005677]
user_id: char(6) [000043]
category_id: char(2) [05]

The reason they want to use it like that, is to be able to use characters (in the far future) if they want. However they have many rules based in numbers, for example, category_id from 01 to 79 correspond to a general category and from 80 to 89 is a special category and 90 to 99 is user defined category.

I personally think that using char(n) to store numbers is a bad practice. My reasons are:

  1. using char, " " != 0, 0 != 00, 05 != 5, 00043 != 000043, and so on. For that reason, the values have to be constantly checked (to prevent data corruption).
  2. If I pad a number: 0 -> 00, then I have to pay attention not to pad a character (A -> 0A)
  3. If characters are used, then ranges become strange, something like: from 01 to 79 and AB and RX and TZ and S, etc...
  4. Indexing numbers instead of chars result in a performance gain

I'm proposing to change it to decimal(n) with zerofill to make it more "error-proof", as this information is modified by different sources (web, windows client, upload csv). If they want to add more categories, for example, then updating from decimal(2) to decimal(3) will be easier.

My question then is: Am I wrong? can char(n) be trusted for this task? If "chars" are evil with numbers, then which other disadvantages am I missing in the above list (I may need better reasons if I want to win my case)?

TIA (any comment/answer will be appreciated).


Solution

  • If this was SQL Server or Oracle or any other RDBMS, I would recommend enforcing a check constraint on those columns so that the data always matched the full capacity of the column - this would ensure your identifiers are uniform.

    Unfortunately MySQL doesn't support this.

    While it wouldn't stop the annoyance of having to pad things coming into the database or in search routines, on the client or in procs in the database, it would guarantee you that the fields were clean at the lowest level.

    I find using constraints like this help avoid things getting badly out of hand.

    As far as the optimization by using numbers, if they have to accommodate non-numeric characters in the future, that's not going to be an option.

    It is very common to have natural keys (which could be candidates for a primary key) with varchar/char data, but yet instead enforce referential integrity on surrogate keys (usually some kind of autonumbering integer which is simply an internal reference, and often the clustered index and primary key).