I'm trying to query a table to find all instances where a character repeats at least 5 times in a row.
I've tried:
Select Column
From Table
where Column REGEXP '(.)\1{4,}'
but it returns nothing.
The table includes the following entries that SHOULD be returned:
1.111111111111E31
00000000000000000
xxxxxxxxxxxxxxxxx
Snowflake does not support backreferences in regular expression patterns (known as “squares” in formal language theory); however, backreferences are supported in the replacement string of the REGEXP_REPLACE function.
Second when the backslash is used inside single-quoted string it has to be escaped:
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify
\d
, use\\d
Possible workaround is to write a custom UDF:
CREATE OR REPLACE FUNCTION regexp_test(r TEXT, t TEXT)
RETURNS BOOLEAN
LANGUAGE PYTHON
STRICT
RUNTIME_VERSION = '3.12'
HANDLER = 'main'
AS $$
import re
def main(r,t):
return bool(re.search(r,t))
$$;
WITH cte(Col) AS (
SELECT '1.111111111111E31' UNION ALL
SELECT '00000000000000000' UNION ALL
SELECT 'xxxxxxxxxxxxxxxxx' UNION ALL
SELECT '12345678'
)
SELECT *, regexp_test('(.)\\1{4,}', col)
FROM cte;
Output:
+-------------------+--------------------------------+
| COL | REGEXP_TEST('(.)\\1{4,}', COL) |
+-------------------+--------------------------------+
| 1.111111111111E31 | TRUE |
| 00000000000000000 | TRUE |
| xxxxxxxxxxxxxxxxx | TRUE |
| 12345678 | FALSE |
+-------------------+--------------------------------+