I have an xml file that looks something like this:
<xml>
<trkseg>
<note>
<to>A</to>
<from>B</from>
<body>
keep this
</body>
</trkseg>
<trkseg>
</note>
...
</trkseg>
</xml>
And I wanted to remove all the following code. This combination of tags can occur more than once in the file:
</trkseg>
<trkseg>
Any tips on how to fix this?
What I expected was this:
<xml>
<trkseg>
<note>
<to>A</to>
<from>B</from>
<body>
keep this
</body>
</note>
...
</trkseg>
</xml>
I tried using this sed command but doesn't work the way I want:
sed -i '' -e '/<\/trkseg>/,/<trkseg>/d' my-file.xml
I get this result:
<xml>
<trkseg>
<note>
<to>A</to>
<from>B</from>
<body>
keep this
</body>
</note>
...
It can be done with GNU sed
sample file
<xml>
<trkseg>
one
two
</trkseg>
<trkseg>
three
four
</trkseg>
</xml>
sed script
sed -znr '{
:-A s/<[\/]trkseg>/&/2;t-B;b-C
:-B s/[[:space:]]*<[\/]trkseg>//1;t-A
:-C s/[[:space:]]*<trkseg>//2g;p
}' file
output:
<xml>
<trkseg>
one
two
three
four
</trkseg>
</xml>