PostgreSQL adding carriage return(next line) from a long field's value

I am having a long value from a field into PostgreSQL database, which has everything in one single line.

With example:

SELECT id, monitor, description
FROM mytable;

Here is the output:

id	monitor	description
25	sensor_1	This is the really long description......

I need to get the description split in multiple lines?

I have tried already the following:

regexp_replace(proofAsText(description), '(.{1,20})', '\1\0x0D', 'g') AS "description"

then

regexp_replace(proofAsText(description), E'[\\n\\r]+', ' ', 'g' ) AS "description"

Unfortunately, none of them effectively work. any suggestion?

Thank you.

Solution

Fixing your `regexp_replace`

Your first attempt was a quite good, but to get the 0x0D interpreted by PostgreSQL you needed an escape string (E'…'),
with the syntax being \x0D instead of 0x0D (or simply \n, but let's stick to your original replacement).

Of course the \ in \1 then has to be escaped, to be passed as is to the regex engine, not interpreted as soon as the escape string.

So:

regexp_replace(description, '(.{1,20})', E'\\1\x0D', 'g')

This is a quite long
 text but with possi
bilities to break it
 up. However one of 
the lines include a 
reallylonglonglongwo
rdthatwillnotfitinas
ingleline

However…

Going further

Why not try to split on spaces, to avoid cutting words in half?

Of course theorically we can have have words longer than 20 characters without a space in them, so we'll keep a special case for them, but then we can just detect words of a non-space, optionally followed by up to 18 characters and ending with another non-space, before a space (or end of text). "optionally", because one-letter words don't have a starting and an ending character.

So:

regexp_replace(description, ' *(?:([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$))', E'\\1\\2\n', 'g')

This is a quite long
text but with
possibilities to
break it up. However
one of the lines
include a
reallylonglonglongwo
rdthatwillnotfitinas
ingleline

And you can see it in a fiddle to play with.

Regex's sectional view

*(?:([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$))

the (?:…)s group without capturing (= memory optimization (the regex engine doesn't have to remember them)
each (space) could be replaced by \s to handle every kind of whitespace (including tabs, but end of lines too; that said if we were smart we would handle EOL specially, so that it is preserved);
I'll do this replacement here, to make them appear:
now for the slicing:
- \s* start by eating every space
- (?:…|…) then have either:
  - ([^\s]{20}) twenty non-space (that we capture to \1 thanks to the (…)
  - or (…)(?:\s|$) consisting of:
    - [^\s](?:.{0,18}[^\s])? (captured to \2 thanks to the wrapping (…) not represented here),
      which has a non-space, followed by an optional ((?:…)?) block of anything of at most 18 characters but ending with a non-space
      (hmm, I could have used \S instead of [^\s])
    - followed by a non-captured (?:\s|$) = either a space, or the end of the text
finally we replace each of those found blocks with \1\2 and an EOL: as we will have filled either \1 (20 chars block) or \2 (less than 20 chars that we found before whitespace), \1\2 is either "empty string followed by \2 or \1 followed by empty string".
and the 'g' applies the regex globally, that is, after having eaten the longest matched string (either way less than or 20 characters, and its potential following whitespace), it will start finding a new match; as we have eaten the preceding whitespace, it should be able to start the new line search with the very next non-eaten character, a non-èspace
(and thus I'm thinking that the first * is unnecessary (apart from the first iteration, that can be replaced by an ltrim()), which would allow to simplify by also removing the (?:): ([^ ]{20})|([^ ](?:.{0,18}[^ ])?)(?: |$))

PostgreSQL adding carriage return(next line) from a long field's value

Fixing your regexp_replace

Going further

Regex's sectional view

Fixing your `regexp_replace`