I want to remove every HTML tag with AWK
using this regex: /[<.*.>]/
if said regex is found in any field. I've been trying to make it work with sub or substr, but I am unable to find the correct logic for this.
Input text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation<br/><div style="margin-top:6px"><b>veniam:</b></div><br/><div style="margin-top:6px"><b>Confort:< /b></div>Comenzi volan; Cruise-control; Servodirectie;<br/>
Expected Output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation veniam: Confort: Comenzi volan; Cruise-control; Servodirectie;
If you're not really parsing HTML but instead just want to remove everything between each <...>
pair in a text file, then that'd be this with GNU awk for multi-char RS:
$ awk -v RS='<[^>]+>' -v ORS= '1' file
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitationveniam: Confort:Comenzi volan; Cruise-control; Servodirectie;