[SOLVED] Using regex in Postgres for special characters

Using regex in Postgres for special characters

I'm using in Postgres 17.1 on Ubuntu.

I want to understand the regex for characters - what is the difference between:

1. delete from tablea where  col~'^[\u2611-\uffffff]+$'
2. delete from tablea where  col~'^[\2611-\uffffff]+$'

I meant to delete some rows with symbols and non English with (1) but I used reg ex #2. Did I delete something important?

And when to use:

 \\unnnn \\xnn \\nnnn

and what are the ranges for each?

Which of those is ASCII and which is Unicode?

Solution

See official documentation on escape sequences in regexes.
For quick experiments, try SELECT substring(E'put string here' from 'put regex here');

`\NNN` vs. `\xNN` vs. `\uNNNN`

\47 and \047 stand for a char with octal value 47_oct = 4⋅8¹ + 7⋅8⁰ = 39
\x27, \x027, \x0027, ... stand for a char with hexadecimal value 27_hex = 2⋅16¹ + 7⋅16⁰ = 39
In this example, all these variants stand for char ' and can be used interchangeably.

\u0053 is just like \x53, \x053, or \x0053, but uses exactly 4 hexadecimal digits. This may be beneficial when followed by literal characters.
Example: You want a string for 'A. To circumvent issues with the '-character inside a 'single-quoted string', you cannot write \x27A or \x0027A, as the A would be interpreted as part of the number in both cases. Instead, you can write \u0027A.

\NNN is similar in this regard. It uses exactly 2-3 octal digits.
\1234 is escape sequence \123 followed by the literal char 4.

I used `\2611` instead of `\u2611`. Did i delete something important?

Yes, probably.

\2611 is \261 followed by a literal 1.
\uffffff is \xffff followed by a literal ff.
This means you basically ran ...

delete from tablea where col~'^([1-\uffff]|\261|f)+$'

The meaning of \261 depends on your database encoding (check SHOW SERVER_ENCODING;). If UTF-8 is used, it is U+00B1 ±. But that should not be your problem.

With 1-\uffff (includes all ASCII symbols, except for control characters and !"#$%&'()*+,-./0) I'm pretty sure you deleted something you wanted to keep.

Using regex in Postgres for special characters

\NNN vs. \xNN vs. \uNNNN

I used \2611 instead of \u2611. Did i delete something important?

`\NNN` vs. `\xNN` vs. `\uNNNN`

I used `\2611` instead of `\u2611`. Did i delete something important?