For the first time in my "real" life I will use a binary data type. We need to store some kind of barcode.
My senior team member told me that I should use varbinary
, because it's a recommendation from documentation (we use Vertica).
I said ok, but my curiosity told me "Why?"
I thought varbinary
or binary
types would print on screen in unreadable text, after select. But it doesn't happen.
So I tested in Vertica and SQLite and they gave me a proper answer.
I create a table and insert data.
create table TEST_VARBINARY_2
(
id int,
va_r binary(5)
);
insert into TEST_VARBINARY_2 (id, va_r)
values (1, '11111')
And this is the answer.
Apparently the database can store string in the varbinary
.
So my question is: why do we use char/varchar
instead of varbinary/binary
?
Varbinary/binary
types can store data more efficiently than varchar/char
- so why do we need varchar/char
?
Could you give me examples or a link to documentation when this question is discussed?
UDP I believe in a comment section I found my answer.
Basically, because bytes are not the same as characters.
BINARY
/VARBINARY
store strings of bytes. But those bytes may correspond to printable ASCII characters
https://docs.vertica.com/24.1.x/en/sql-reference/data-types/binary-data-types-binary-and-varbinary/ says:
Like the input format, the output format is a hybrid of octal codes and printable ASCII characters. A byte in the range of printable ASCII characters (the range
[0x20, 0x7e]
) is represented by the corresponding ASCII character, with the exception of the backslash ('\'
), which is escaped as'\\'
. All other byte values are represented by their corresponding octal values. For example, the bytes{97,92,98,99}
, which in ASCII are{a,\,b,c}
, are translated to text as'a\\bc'
.
This is why your string '1111'
printed normally. Those are printable ASCII characters. They're actually the byte value 49, but when output to a text display they are printable characters.
These binary string types store only bytes. If you want to store characters that use other encoding besides ASCII, or use a collation to guide sorting and character comparisons, you must use CHAR
/VARCHAR
and possibly a locale.
You said you're using Vertica. https://docs.vertica.com/24.1.x/en/admin/about-locale/locale-and-utf-8-support/ says:
Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8.
The following string functions treat
VARCHAR
arguments as UTF-8 strings (whenUSING OCTETS
is not specified) regardless of locale setting.
(followed by the list of string functions)
Because UTF-8 characters are variable in length, the length in characters of a string can be different from the length in bytes. The LENGTH()
string function reports CHARACTER_LENGTH()
when given a CHAR
/VARCHAR
argument, but reports OCTET_LENGTH()
when given a BINARY
/VARBINARY
argument.
Sorting is another important property of strings. When sorting binary data, the byte values are used for the order. Likewise if sorting character data with a binary collation. But if you want accurate sorting for a specific locale, the byte order is not necessarily the correct order for a given locale.
Read https://docs.vertica.com/24.1.x/en/admin/about-locale/ for more about locale in Vertica.