I have column in Postgres db which has text in char varying
data type. The text includes an uri which contains file name and resembles as below;
The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\12345678.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.
I am trying to get the file name 12345678.PDF
and date 20221122
from the text content. However, regexp_replace either gives me everything till file name or everything after filename. I am trying to get only file name
1>> Regexp_replace(data, '.+\\', '')
Yields filename and everything after it
2>> Regexp_replace(data, '\[.*', '')
Yields filename and everything after it
If I capture two groups like below I get same result as 1.
Regexp_replace(data, '.+\\|\[', '')
How can I substitute 2 groups and only get filename? Or what is the better way to achieve this? And I need to get the date value but if I can figure this out maybe I will be able to apply the learning for to extract date value. Thanks for your time.
You can use
SELECT REGEXP_MATCHES(
'The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\2779780.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.',
'([^[:space:]\\/]+)\s+\[([^][]+)') AS Result;
See the DB fiddle, result:
Details:
([^[:space:]\\/]+)
- Group 1: one or more chars other than \
, /
and whitespace\s+
- one or more whitespaces\[
- a [
char([^][]+)
- Group 2: one or more chars other than [
and ]
.