bashawksedincludecat

Replacing filename placeholder with file contents in sed


I'm trying to write a basic script to compile HTML file includes. The premise goes like this:

I have 3 files

test.html

<div>
   @include include1.html

   <div>content</div>

   @include include2.html
</div>

include1.html

<span>
   banana
</span>

include2.html

<span>
   apple
</span>

My desired output would be:

output.html

<div>
   <span>
      banana
   </span>

   <div>content</div>

   <span>
      apple
   </span>
</div>

I've tried the following:

  1. sed "s|@include \(.*)|$(cat \1)|" test.html >output.html
    This returns cat: 1: No such file or directory

  2. sed "s|@include \(.*)|cat \1|" test.html >output.html
    This runs but gives:

    output.html

    <div>
       cat include1.html
    
       <div>content</div>
    
       cat include2.html
    </div>
    

Any ideas on how to run cat inside sed using group substitution? Or perhaps another solution.


Solution

  • I wrote this 15-20 years ago to recursively include files and it's included in the article I wrote about how/when to use getline under "Applications" then "d)". I tweaked it now to work with your specific "@include" directive, provide indenting to match the "@include" indentation, and added a safeguard against infinite recursion (e.g. file A includes file B and file B includes file A):

    $ cat tst.awk
    function read(file,indent) {
        if ( isOpen[file]++ ) {
            print "Infinite recursion detected" | "cat>&2"
            exit 1
        }
    
        while ( (getline < file) > 0) {
            if ($1 == "@include") {
                 match($0,/^[[:space:]]+/)
                 read($2,indent substr($0,1,RLENGTH))
            } else {
                 print indent $0
            }
        }
        close(file)
    
        delete isOpen[file]
    }
    
    BEGIN{
       read(ARGV[1],"")
       exit
    }
    

    .

    $ awk -f tst.awk test.html
    <div>
       <span>
          banana
       </span>
    
       <div>content</div>
    
       <span>
          apple
       </span>
    </div>
    

    Note that if include1.html itself contained a @include ... directive then it'd be honored too, and so on. Look:

    $ for i in test.html include?.html; do printf -- '-----\n%s\n' "$i"; cat "$i"; done
    -----
    test.html
    <div>
       @include include1.html
    
       <div>content</div>
    
       @include include2.html
    </div>
    -----
    include1.html
    <span>
       @include include3.html
    </span>
    -----
    include2.html
    <span>
       apple
    </span>
    -----
    include3.html
    <div>
       @include include4.html
    </div>
    -----
    include4.html
    <span>
       grape
    </span>
    

    .

    $ awk -f tst.awk test.html
    <div>
       <span>
          <div>
             <span>
                grape
             </span>
          </div>
       </span>
    
       <div>content</div>
    
       <span>
          apple
       </span>
    </div>
    

    With a non-GNU awk I'd expect it to fail after many levels of recursion with a "too many open files" error so get gawk if you need to go deeper than that or you'd have to write your own file management code.