I am having a long value from a field into PostgreSQL database, which has everything in one single line.
With example:
SELECT id, monitor, description
FROM mytable;
Here is the output:
id | monitor | description |
---|---|---|
25 | sensor_1 | This is the really long description...... |
I need to get the description split in multiple lines?
I have tried already the following:
regexp_replace(proofAsText(description), '(.{1,20})', '\1\0x0D', 'g') AS "description"
then
regexp_replace(proofAsText(description), E'[\\n\\r]+', ' ', 'g' ) AS "description"
Unfortunately, none of them effectively work. any suggestion?
Thank you.
regexp_replace
Your first attempt was a quite good, but to get the 0x0D
interpreted by PostgreSQL you needed an escape string (E'…'
),
with the syntax being \x0D
instead of 0x0D
(or simply \n
, but let's stick to your original replacement).
Of course the \
in \1
then has to be escaped, to be passed as is to the regex engine, not interpreted as soon as the escape string.
So:
regexp_replace(description, '(.{1,20})', E'\\1\x0D', 'g')
This is a quite long
text but with possi
bilities to break it
up. However one of
the lines include a
reallylonglonglongwo
rdthatwillnotfitinas
ingleline
However…
Why not try to split on spaces, to avoid cutting words in half?
Of course theorically we can have have words longer than 20 characters without a space in them, so we'll keep a special case for them, but then we can just detect words of a non-space, optionally followed by up to 18 characters and ending with another non-space, before a space (or end of text). "optionally", because one-letter words don't have a starting and an ending character.
So:
regexp_replace(description, ' *(?:([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$))', E'\\1\\2\n', 'g')
This is a quite long
text but with
possibilities to
break it up. However
one of the lines
include a
reallylonglonglongwo
rdthatwillnotfitinas
ingleline
And you can see it in a fiddle to play with.
*(?:([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$))
(?:…)
s group without capturing (= memory optimization (the regex engine doesn't have to remember them)
(space) could be replaced by \s
to handle every kind of whitespace (including tabs, but end of lines too; that said if we were smart we would handle EOL specially, so that it is preserved);\s*
start by eating every space(?:…|…)
then have either:
([^\s]{20})
twenty non-space (that we capture to \1
thanks to the (…)
(…)(?:\s|$)
consisting of:
[^\s](?:.{0,18}[^\s])?
(captured to \2
thanks to the wrapping (…)
not represented here),(?:…)?
) block of anything of at most 18 characters but ending with a non-space\S
instead of [^\s]
)(?:\s|$)
= either a space, or the end of the text
\1\2
and an EOL: as we will have filled either \1
(20 chars block) or \2
(less than 20 chars that we found before whitespace), \1\2
is either "empty string followed by \2
or \1
followed by empty string".'g'
applies the regex globally, that is, after having eaten the longest matched string (either way less than or 20 characters, and its potential following whitespace), it will start finding a new match; as we have eaten the preceding whitespace, it should be able to start the new line search with the very next non-eaten character, a non-èspace *
is unnecessary (apart from the first iteration, that can be replaced by an ltrim()
), which would allow to simplify by also removing the (?:)
: ([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$)
)