bashsedcode-transformation

Trying to get generic sed multi-line pattern match and substitution script to work


There is a generic approach to solving the problem presented by this poster which is presented on here, in section 4.23.3 . It appears to offer a method for handling any complex content pattern for matching target, then replacing that with, again, any other complex content pattern. The technique is referred to as the "sliding-window" technique.

I believe the below script faithfully recreates the scenario described and attempts to incorporate the sed script to demonstrate that approach as workable.

#!/bin/bash

DBG=1

###
### Code segment to be replaced
###
file1="File1.cpp"
rm -f "${file1}"
cat >"${file1}" <<"EnDoFiNpUt"
void Component::initialize()
{
    my_component = new ComponentClass();
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 1"

###
### Code segment to be used as replacement
###
file2="File2.cpp"
rm -f "${file2}"
cat >"${file2}" <<"EnDoFiNpUt"
void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 2"

###
### Create demo input file
###
testfile="Test_INPUT.cpp"
rm -f "${testfile}"
{
    echo "
other code1()
{
    doing other things
    doing more things
    doing extra things
} 
"
    cat "${file1}"

echo "
other code2()
{
    creating other things
    creating more things
    creating extra things
} 
"
} >>"${testfile}"

test ${DBG} -eq 1 && echo "fence 3"

###
### Create editing specification file
###
{
    cat "${file1}"
    echo "###REPLACE_BY###"
    cat "${file2}"
} >findrep.txt

test ${DBG} -eq 1 && echo "fence 4"


###
### sed script to create editing instructions to apply aove editing specification file
###
cat >"blockrep.sed" <<"EnDoFiNpUt"
#SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
#
# filename: blockrep.sed
#   author: Paolo Bonzini
# Requires:
#    (1) blocks to find and replace, e.g., findrep.txt
#    (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
:a
/^###REPLACE_BY###$/! {
   # Escape slashes, backslashes, the final newline and
   # regular expression metacharacters.
   s,[/\[.*],\\&,g
   s/$/\\/
   H
   #
   # Append N cmds needed to maintain the sliding window.
   x
   1 s,^.,s/,
   1! s/^/N\
/
   x
   n
   ba
}
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
x
s,\\$,/,
x
#
# Until EOF, gather the substitution into hold space.
:b
n
s,[/\],\\&,g
$! s/$/\\/
H
$! bb
#
# Start the RHS of the s command without a leading
# newline, add the P/D pair for the sliding window, and
# print the script.
g
s,/\n,/,
s,$,/\
P\
D,p
#---end of script---
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 5"


sed --debug -nf blockrep.sed findrep.txt >custom.sed
test ${DBG} -eq 1 && echo "fence 6"

if [ -s custom.sed ]
then
    more custom.sed
    echo -e "\t Hit return to continue ..." >&2
    read k <&2
else
    echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
    exit 1
fi

testout="Test_OUTPUT.cpp"

sed -f custom.sed "${testfile}" >"${testout}"
test ${DBG} -eq 1 && echo "fence 7"

if [ -s "${testout}" ]
then
    more "${testout}"
else
    echo -e "\t Failed to create '${testout}'.\n" >&2
    exit 1
fi

Unfortunately, what they presented doesn't seem to work. I wish there was something like bash's "set -x" for command expansion/reporting of sed execution to stderr, but I haven't found anything like that.

The execution log for the above is as follows:

fence 1
fence 2
fence 3
fence 4
fence 5
sed: file blockrep.sed line 19: unterminated `s' command
fence 6
     Failed to create 'custom.sed'.  Unable to proceed!

Maybe an expert out there can resolve the logic error in the imported blockrep.sed script ... because I can't get my head wrapped around it to fix it, even with all the comments provided.

I openly attest to the fact that I am very simplistic/limited in both my knowledge, and my usage, of sed. I couldn't begin to understand how that "blockrep.sed" script is trying to do what it claims, only that it states all content of findrep.txt, before the defined separator string "###REPLACE_BY###", is to be replaced by all below that same separator.

In my view, the approach identified by the linuxtopia guide would have broad application and be beneficial for many, including the OP and myself.


Solution

  • I resorted to discretizing portions of the blockrep.sed script to see if I could identify a source of breakdown. While that made no logical difference on the surface, that did create a functional and well-formed structure ... which did create a usable custom.sed, but only after I removed the --debug option for the execution of blockrep.sed. That is required because the debug info is not sent to the stderr, but is inline with the stdout !!! I don't know enough to classify that as a bug.

    NOTE: I've also added command-line options to specify names of files for input, output, old_pattern, new_pattern, divider, among others.

    The now modified and working version of the script is as follows:

    #!/bin/bash
    
    reportFiles ()
    {
        ls -l "${fileSrchPat}" "${fileReplPat}" "findrep.txt" 2>&1
        ls -l "blockrep.sed" "custom.sed" "custom.err" 2>&1
        ls -l "${fileBefore}" 2>&1
        ls -l "${fileOutput}" 2>&1
    }
    
    DBG=0
    DBGs=0
    dumA=1 ;
    dumB=1 ;
    fileSrchPat=""
    fileReplPat=""
    divider="----REPLACE_BY----"
    doReview=0
    fileBefore=""
    fileOutput=""
    
    while [ $# -gt 0 ]
    do
        case $1 in
            --debug )
                DBG=1 ;
                shift ;;
            --debug_sed )
                DBGs=1 ;
                shift ;;
            --verbose )
                set -x
                shift ;;
            --old_pattern )
                dumA=0 ;
                fileSrchPat="$2" ;
                if [ ! -s "${fileSrchPat}" ]
                then
                    echo -e "\n File '${fileSrchPat}' not found.\n Bye!\n"
                    exit 1
                fi ;
                shift ; shift ;;
            --new_pattern )
                dumA=0 ;
                fileReplPat="$2" ;
                if [ ! -s "${fileReplPat}" ]
                then
                    echo -e "\n File '${fileReplPat}' not found.\n Bye!\n"
                    exit 1
                fi ;
                shift ; shift ;; 
            --pattern_sep )
                divider="$2" ;
                shift ; shift ;;
            --input )
                dumB=0 ;
                fileBefore="$2" ;
                if [ ! -s "${fileBefore}" ]
                then
                    echo -e "\n File '${fileBefore}' not found.\n Bye!\n"
                    exit 1
                fi ;
                shift ; shift ;;
            --output )
                fileOutput="$2" ;
                if [ ! -s "${fileOutput}" ]
                then
                    echo -e "\n File '${fileOutput}' already exists.  Overwrite ? [y|N] => \c"
                    read goAhead
                    if [ -z "${goAhead}" ] ; then  goAhead="N" ; fi
                    case ${goAhead} in
                        y* | Y* ) rm -vf "${fileOutput}" ;;
                        * ) echo -e "\n\t Process abandoned.\n Bye!\n" ; exit 1 ;;
                    esac
                    exit 1
                fi ;
                shift ; shift ;;
            --review ) doReview=1 ; shift ;;
            * ) echo "\n invalid option used on command line.  Only valid options: [ --old_pattern {textfile1} | --new_pattern {textfile2} ] \n Bye!\n" ; exit 1 ;;
        esac
    done
    
    ###
    ### Code segment to be replaced
    ###
    if [ -z "${fileSrchPat}" ]
    then
        fileSrchPat="File1.cpp"
        rm -f "${fileSrchPat}"
        cat >"${fileSrchPat}" <<"EnDoFiNpUt"
    void Component::initialize()
    {
        my_component = new ComponentClass();
    }
    EnDoFiNpUt
    
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 1"
    
    
    ###
    ### Code segment to be used as replacement
    ###
    if [ -z "${fileReplPat}" ]
    then
        fileReplPat="File2.cpp"
        rm -f "${fileReplPat}"
        cat >"${fileReplPat}" <<"EnDoFiNpUt"
    void Component::initialize()
    {
        if (doInit)
        {
            my_component = new ComponentClass();
        }
        else
        {
            my_component.ptr = null;
        }
    }
    EnDoFiNpUt
    
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 2"
    
    
    if [ -z "${fileBefore}" ]
    then
    ###
    ### Create demo input file
    ###
        fileBefore="Test_INPUT.cpp"
        rm -f "${fileBefore}"
        {
        echo "
    other code1()
    {
        doing other things
        doing more things
        doing extra things
    } 
    "
        cat "${fileSrchPat}"
    
        echo "
    other code2()
    {
        creating other things
        creating more things
        creating extra things
    } 
    "
        } >>"${fileBefore}"
    
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 3"
    
    ###
    ### Create editing specification file
    ###
    {
        cat "${fileSrchPat}"
        echo "${divider}"
        cat "${fileReplPat}"
    } >findrep.txt
    
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 4"
    
    
    ###
    ### sed script to create editing instructions to apply above editing specification file
    ###
    test ${DBG} -eq 1 && rm -fv "blockrep.sed" || rm -f "blockrep.sed"
    
    cat >"blockrep.sed" <<"EnDoFiNpUt"
    #SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
    #
    # filename: blockrep.sed
    #   author: Paolo Bonzini
    # 
    # Modified by: Eric Marceau, Feb 2023
    #
    # Requires:
    #    (1) blocks to find and replace, e.g., findrep.txt
    #    (2) an input file to be changed, input.file
    #
    # blockrep.sed creates a second sed script, custom.sed,
    # to find the lines above the row of 4 hyphens, globally
    # replacing them with the lower block of text. GNU sed
    # is recommended but not required for this script.
    #
    # Loop on the first part, accumulating the `from' text
    # into the hold space.
    #
    ##############################################################################
    ### Reworked Discretized version of coding
    ##############################################################################
    #
    ##############################################################################
    ### Begin of capture - SEARCH pattern
    ##############################################################################
    :markerA
    EnDoFiNpUt
    
    #echo "/^----REPLACE_BY----\$/! {" >> "blockrep.sed"
    echo "/^${divider}\$/! {" >> "blockrep.sed"
    
    cat >>"blockrep.sed" <<"EnDoFiNpUt"
    #
    # Escape slashes
        s,[/],\\&,g
    #
    # Escape backslashes
        s,[\],\\&,g
    #
    # Escape regular expression metacharacters
        s,[[],\\&,g
        s,[.],\\&,g
        s,[*],\\&,g
    #
    # Escape the final newline
    #  add backslash to end of line (to avoid having sed 
    #  think of as end of command input)
        s,$,\\,
    #
    # APPEND  -  PATTERN space to HOLD space
        H
    #
    # Sequence to APPEND "N" cmds needed to maintain the sliding window.
    # \\ swap contents - HOLD and PATTERN space
        x
    #
    # If first line, begin constructing sed command for pattern match and replace
        1 s,^.,s/,
    #
    # If not first line, add line with "N" 
    #  i.e. give instruction to "APPEND the next line of input into the pattern space"
        1! s,^,N\
    ,
    # // swap contents again - HOLD and PATTERN space
        x
    #
    # COPY  -  next line of input into PATTERN space
        n
    #
    # branch/jump to label markerA
        b markerA
    }
    #
    ##############################################################################
    ### End of capture - SEARCH pattern
    ##############################################################################
    #
    #
    ##############################################################################
    ### Begin of capture - REPLACEMENT pattern
    ##############################################################################
    #
    # \\ swap contents - HOLD and PATTERN space
        x
    #
    # Change the final backslash to a slash to separate the
    # two sides of the s command.
        s,\\$,/,
    #
    # // swap contents again - HOLD and PATTERN space
        x
    #
    # Until EOF, gather the REPLACEMENT TEXT into the hold space.
    :markerB
        n
    #
    # Escape slashes
        s,[/],\\&,g
    #
    # Escape backslashes
        s,[\],\\&,g
    #
    # If not last line, add backslash to escape all instances of "$".
        $! s,$,\\,
    #
    # APPEND  -  PATTERN space to HOLD space
        H
    #
    # If not last line, branch/jump to markerB
        $! b markerB
    #
    ##############################################################################
    ### End of capture - SEARCH pattern
    ##############################################################################
    #
    #
    # Start the Right-Hand Side (RHS) of the "s" command without a leading newline,
    # add the P/D pair for the sliding window, and
    # print the script.
    #
    # COPY  -  HOLD space to PATTERN space
        g
        s,/\n,/,
    #
    # (P) Print up to the first embedded newline of the current pattern space.
    #  then
    # (D) If  pattern  space  contains no newline, start a normal new cycle as if 
    #   the d command was issued.  Otherwise, delete text in the pattern space 
    #   up to the first newline, and restart cycle with the resultant pattern space,
    #   without reading a new line of input.
    #  then
    # (p) Print the current pattern space.
        s,$,/\
        P\
        D,p
    #---end of script---
    EnDoFiNpUt
    
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 5"
    
    test ${DBG} -eq 1 && rm -fv custom.sed custom.err || rm -f custom.sed custom.err
    
    if [ ${DBGs} -eq 1 ]
    then
        echo -e "\n\t NOTE:  debug mode active for 'sed' command ..."
        sed --debug -f blockrep.sed findrep.txt >custom.sed 2>custom.err
    else
        sed -nf blockrep.sed findrep.txt >custom.sed 2>custom.err
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 6"
    
    
    if [ -s custom.err ]
    then
        if [ ${doReview} -eq 1 ]
        then
            cat custom.err
        fi
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 7"
    
    
    if [ -s custom.sed ]
    then
        if [ ${doReview} -eq 1 ]
        then
            more custom.sed
            echo -e "\n\t Hit return to continue ..." >&2
            read k <&2
        fi
        if [ ${DBGs} -eq 1 ]
        then
            more custom.sed
            echo -e "\n =============  End of Review - 'custom.sed' containing execution debug reporting  =============" >&2
            echo -e "\n\t 'custom.sed' is not in usable form due to '--debug' messaging." >&2
            echo -e   "\t Abandoning before attempting final transformation.\n Bye!\n" >&2
            exit 2
        fi
    else
        echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
        exit 1
    fi
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 8"
    
    
    if [ -z "${fileOutput}" ]
    then
        fileOutput="Test_OUTPUT.cpp"
    fi
    rm -f "${fileOutput}"
    sed -f custom.sed "${fileBefore}" >"${fileOutput}"
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 9"
    
    
    if [ -s "${fileOutput}" ]
    then
        test ${DBG} -eq 1 && echo -e "\n\t ======== fence 10"
        if [ ${doReview} -eq 1 ]
        then
            more "${fileOutput}"
        fi
        if [ ${DBG} -eq 1 ]
        then
            reportFiles
            if [ ${dumA} -eq 1 ]
            then
                rm -fv "${fileSrchPat}" "${fileReplPat}" 2>&1
            fi
            if [ ${dumB} -eq 1 ]
            then
                rm -fv "${fileBefore}" 2>&1
            fi
            rm -fv "findrep.txt" "custom.sed" "custom.err" "blockrep.sed" 2>&1
        else
            if [ ${dumA} -eq 1 ]
            then
                rm -f "${fileSrchPat}" "${fileReplPat}" 2>&1
            fi
            if [ ${dumB} -eq 1 ]
            then
                rm -f "${fileBefore}" 2>&1
            fi
            rm -f "findrep.txt" "custom.sed" "custom.err" "blockrep.sed" 2>&1
        fi | awk '{ printf("\t %s\n", $0 ) ; }' >&2
    else
        echo -e "\t Failed to create '${fileOutput}'.\n" >&2
        reportFiles | awk '{ printf("\t %s\n", $0 ) ; }' >&2
        exit 1
    fi
    
    
    exit
    

    Using the following command to execute (using default built-in demo case)

    ./script.sh --debug --review
    

    The resulting session output is

         ======== fence 1
         ======== fence 2
         ======== fence 3
         ======== fence 4
         ======== fence 5
         ======== fence 6
         ======== fence 7
    N
    N
    N
    s/void Component::initialize()\
    {\
        my_component = new ComponentClass();\
    }/void Component::initialize()\
    {\
        if (doInit)\
        {\
            my_component = new ComponentClass();\
        }\
        else\
        {\
            my_component.ptr = null;\
        }\
    }/
        P
        D
    
         Hit return to continue ...
    
         ======== fence 8
         ======== fence 9
         ======== fence 10
    
    other code1()
    {
        doing other things
        doing more things
        doing extra things
    } 
    
    void Component::initialize()
    {
        if (doInit)
        {
            my_component = new ComponentClass();
        }
        else
        {
            my_component.ptr = null;
        }
    }
    
    other code2()
    {
        creating other things
        creating more things
        creating extra things
    } 
    
     -rw-rw-r-- 1 ericthered ericthered  71 Feb  4 16:05 File1.cpp
     -rw-rw-r-- 1 ericthered ericthered 130 Feb  4 16:05 File2.cpp
     -rw-rw-r-- 1 ericthered ericthered 220 Feb  4 16:05 findrep.txt
     -rw-rw-r-- 1 ericthered ericthered 3604 Feb  4 16:05 blockrep.sed
     -rw-rw-r-- 1 ericthered ericthered    0 Feb  4 16:05 custom.err
     -rw-rw-r-- 1 ericthered ericthered  229 Feb  4 16:05 custom.sed
     -rw-rw-r-- 1 ericthered ericthered 240 Feb  4 16:05 Test_INPUT.cpp
     -rw-rw-r-- 1 ericthered ericthered 299 Feb  4 16:06 Test_OUTPUT.cpp
     removed 'File1.cpp'
     removed 'File2.cpp'
     removed 'Test_INPUT.cpp'
     removed 'findrep.txt'
     removed 'custom.sed'
     removed 'custom.err'
     removed 'blockrep.sed'
    

    Which is as was initially intended. Success!