sqlvertica

Why do we use char/varchar instead of varbinary/binary?


For the first time in my "real" life I will use a binary data type. We need to store some kind of barcode.

My senior team member told me that I should use varbinary, because it's a recommendation from documentation (we use Vertica).

I said ok, but my curiosity told me "Why?"

I thought varbinary or binary types would print on screen in unreadable text, after select. But it doesn't happen.

So I tested in Vertica and SQLite and they gave me a proper answer.

I create a table and insert data.

create table TEST_VARBINARY_2
(
    id int,
    va_r binary(5)
);

insert into TEST_VARBINARY_2 (id, va_r)
values (1, '11111')

And this is the answer.

enter image description here

Apparently the database can store string in the varbinary. So my question is: why do we use char/varchar instead of varbinary/binary?

Varbinary/binary types can store data more efficiently than varchar/char - so why do we need varchar/char?

Could you give me examples or a link to documentation when this question is discussed?

UDP I believe in a comment section I found my answer.

  1. Not all RDBMS have binary type
  2. Not all RDBMS support string functions for binary types

Solution

  • Basically, because bytes are not the same as characters.

    BINARY/VARBINARY store strings of bytes. But those bytes may correspond to printable ASCII characters

    https://docs.vertica.com/24.1.x/en/sql-reference/data-types/binary-data-types-binary-and-varbinary/ says:

    Like the input format, the output format is a hybrid of octal codes and printable ASCII characters. A byte in the range of printable ASCII characters (the range [0x20, 0x7e]) is represented by the corresponding ASCII character, with the exception of the backslash ('\'), which is escaped as '\\'. All other byte values are represented by their corresponding octal values. For example, the bytes {97,92,98,99}, which in ASCII are {a,\,b,c}, are translated to text as 'a\\bc'.

    This is why your string '1111' printed normally. Those are printable ASCII characters. They're actually the byte value 49, but when output to a text display they are printable characters.

    These binary string types store only bytes. If you want to store characters that use other encoding besides ASCII, or use a collation to guide sorting and character comparisons, you must use CHAR/VARCHAR and possibly a locale.

    You said you're using Vertica. https://docs.vertica.com/24.1.x/en/admin/about-locale/locale-and-utf-8-support/ says:

    Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8.

    The following string functions treat VARCHAR arguments as UTF-8 strings (when USING OCTETS is not specified) regardless of locale setting.

    (followed by the list of string functions)

    Because UTF-8 characters are variable in length, the length in characters of a string can be different from the length in bytes. The LENGTH() string function reports CHARACTER_LENGTH() when given a CHAR/VARCHAR argument, but reports OCTET_LENGTH() when given a BINARY/VARBINARY argument.

    Sorting is another important property of strings. When sorting binary data, the byte values are used for the order. Likewise if sorting character data with a binary collation. But if you want accurate sorting for a specific locale, the byte order is not necessarily the correct order for a given locale.

    Read https://docs.vertica.com/24.1.x/en/admin/about-locale/ for more about locale in Vertica.