I'm using in Postgres 17.1 on Ubuntu.
I want to understand the regex for characters - what is the difference between:
1. delete from tablea where col~'^[\u2611-\uffffff]+$'
2. delete from tablea where col~'^[\2611-\uffffff]+$'
I meant to delete some rows with symbols and non English with (1) but I used reg ex #2. Did I delete something important?
And when to use:
\\unnnn \\xnn \\nnnn
and what are the ranges for each?
Which of those is ASCII and which is Unicode?
See official documentation on escape sequences in regexes.
For quick experiments, try
SELECT substring(E'put string here' from 'put regex here');
\NNN
vs. \xNN
vs. \uNNNN
\47
and \047
stand for a char with octal value 47oct = 4⋅8¹ + 7⋅8⁰ = 39
\x27
, \x027
, \x0027
, ... stand for a char with hexadecimal value 27hex = 2⋅16¹ + 7⋅16⁰ = 39
In this example, all these variants stand for char '
and can be used interchangeably.
\u0053
is just like \x53
, \x053
, or \x0053
, but uses exactly 4 hexadecimal digits. This may be beneficial when followed by literal characters.
Example: You want a string for 'A
. To circumvent issues with the '
-character inside a 'single-quoted string'
, you cannot write \x27A
or \x0027A
, as the A
would be interpreted as part of the number in both cases. Instead, you can write \u0027A
.
\NNN
is similar in this regard. It uses exactly 2-3 octal digits.
\1234
is escape sequence \123
followed by the literal char 4
.
\2611
instead of \u2611
. Did i delete something important?Yes, probably.
\2611
is \261
followed by a literal 1
.
\uffffff
is \xffff
followed by a literal ff
.
This means you basically ran ...
delete from tablea where col~'^([1-\uffff]|\261|f)+$'
The meaning of \261
depends on your database encoding (check SHOW SERVER_ENCODING;
). If UTF-8
is used, it is U+00B1 ±
. But that should not be your problem.
With 1-\uffff
(includes all ASCII symbols, except for control characters and !"#$%&'()*+,-./0
) I'm pretty sure you deleted something you wanted to keep.