mysqltypesvarchar

Dynamic size for data type in MySQL


I want to save a string in a column of table in MySQL. Sometimes it is 2 characters and sometimes very larger. 50000 characters. Is it a good idea to user varchar(50000). If I only save 2 characters in the column, does it use all 4998 bytes or just 2 bytes?


Solution

  • As documented under Data Type Storage Requirements:

    Storage Requirements for String Types

    In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

    Data Type Storage Required
    CHAR(M) M × w bytes, 0 <= M <= 255, where w is the character in the character set
    BINARY(M) M bytes, 0 <= M <= 255
    VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes
    TINYBLOB, TINYTEXT L + 1 bytes, where L < 28
    BLOB, TEXT L + 2 bytes, where L < 216
    MEDIUMBLOB, MEDIUMTEXT L + 3 bytes, where L < 224
    LONGBLOB, LONGTEXT L + 4 bytes, where L < 232
    ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
    SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of set members (64 members maximum)

    Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value.

    To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 (or utf8mb4) Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three (four) bytes per character. For a breakdown of the storage used for different categories of utf8 or utf8mb4 characters, see Section 10.1.10, “Unicode Support”.

    VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors:

    • The actual length of the column value

    • The column's maximum possible length

    • The character set used for the column, because some character sets contain multi-byte characters

    For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

    Therefore, in answer your question:

    If I only save 2 characters in the column, does it use all 4998 bytes or just 2 bytes?

    A VARCHAR(50000) column storing a 2-character string would require L+2 bytes, where L is the number of bytes required to encode that 2-character string in the column's character set: it certainly will not use "all 4998 bytes".