I have some records like below:
urutan | desc |
---|---|
1432 | AMAN032 - Gunting |
1433 | BENANG156 - Sikat Pintu |
1434 | Oli Bell One AT-D 20W40 |
1435 | Water Refill |
1436 | KABUR001 - Gosok Air |
1437 | Kabel Ties 20 Cm - 50 |
Based on the table above, i have two formats of string in field desc:
as you can see, the correct format has format like this below:
AMAN032 - Gunting
then i split:
[AMAN][032] - [Gunting]
[a word][3 digits number][space][-][space][any words]
.
my goal is I want to SELECT all records which does not match the correct format using PosgreSQL REGEXP in WHERE Clause (in a condition I don't know which urutan). so the result from table above are urutan 1434, 1435, 1437.
Try with the following query:
SELECT *
FROM tab
WHERE NOT desc_ ~ '^[A-Za-z]+[0-9]{3} - [A-zA-Z ]+$'
Regex Explanation:
^
: start of string[A-Za-z]+
: any alphabetical character[0-9]{3}
: three digits-
: space + dash + space[A-zA-Z ]+
: any combination of alphabetical character and space$
: end of stringTry it here.
Note: The hole in this regex may be in the last combination of alphabetical characters. You can either play with it at the provided link to include or exclude characters that you want/don't want to match for that part, or share further details on how that part can be composed of, and I could try improving the pattern matching.