I have a file with lines that contain:
<li><b> Some Text:</b> More Text </li>
I want to remove the html tags and replace the </b>
tag with a dash so it becomes like this:
Some Text:- More Text
I'm trying to use sed however I can't find the proper regex combination.
If you strictly want to strip all HTML tags, but at the same time only replace the </b>
tag with a -
, you can chain two simple sed
commands with a pipe:
cat your_file | sed 's|</b>|-|g' | sed 's|<[^>]*>||g' > stripped_file
This will pass all the file's contents to the first sed
command that will handle replacing the </b>
to a -
. Then, the output of that will be piped to a sed
that will replace all HTML tags with empty strings. The final output will be saved into the new file stripped_file
.
Using a similar method as the other answer from @Steve, you could also use sed
's -e
option to chain expressions into a single (non-piped command); by adding -i
, you can also read-in and replace the contents of your original file without the need for cat
, or a new file:
sed -i -e 's|</b>|-|g' -e 's|<[^>]*>||g' your_file
This will do the replacement just as the chained-command above, however this time it will directly replace the contents in the input file. To save to a new file instead, remove the -i
and add > stripped_file
to the end (or whatever file-name you choose).