macossedtext-filesgnu-sed

How add to any `{` and `:` character in a file a newline


I want to copy a given text file (on macOS or Unix in general) to stdout where to any { and : character a newline is added.

I tried

sed “s/{/{\n/g” myfile.txt 

Just for the curly bracket character, but this doesn’t seem to work.

Do I need to install the GNU version of sed for this? And if so, how can I add newlines to both characters { and : in one go?


Solution

  • Just backslash the literal newline.

    catalina$ sed 's/[{:]/\
    > /'g <<\_
    > hello: this{is} an example{of}something: poo
    > there
    > _
    hello
     this
    is} an example
    of}something
     poo
    there
    

    If you would like to replace the matched character with itself, include & in the replacement.

    catalina$ sed 's/[{:]/&\
    /'g <<\_
    > hello: this is {another} example
    > more: newlines!
    > here
    > _
    hello:
     this is {
    another} example
    more:
     newlines!
    here
    

    There are many variations around how exactly this works in exactly which sed version, though I believe the above should work everywhere. GNU sed adds some conveniences, like the ability to use \n as an abbreviation for newline, but this is not portable. In general, I would suggest moving to Awk or Perl if you need to use non-portable sed features.

    Sometimes you can also use shell features like sed $'s/[{:]/\\\n/g' but this (for now) is Bash-specific, and probably not an improvement in terms of legibility. (In brief, $'...' offers a single-quoted string with "C string" semantics, meaning the shell converts \n to a literal newline, \t to a literal tab, etc; you then also need to escape a literal backslash with another backslash to prevent the shell from interpreting it as something else. This shell feature is proposed to be included in a future POSIX version, so it will ultimately be portable to any POSIX-conformant shell, but don't hold your breath.)

    In case it's not obvious, [{:] is a regex character class which matches a single character out of the enumeration between the square brackets.