ruby-on-railsrubyruby-2.2

Replace video tags from HTML string


The HTML string is:

"<div>\r\n<video controls=\"controls\" height=\"313\" id=\"video201643154436\" poster=\"/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg\" width=\"500\"><source src=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\" type=\"video/mp4\" />Your browser doesn&#39;t support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n<video controls=\"controls\" height=\"300\" id=\"video201644152011\" poster=\"\" width=\"400\"><source src=\"/uploads/ckeditor/attachments/24/test.mp4\" type=\"video/mp4\" />Your browser doesn&#39;t support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/24/test.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<p>&nbsp;</p>\r\n</div>\r\n"

I want to replace all video tags including its content and sub tags with [[ Video ]]

The expected output is:

"<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<p>&nbsp;</p>\r\n</div>\r\n"

I have tried using the regex /<video\s(.*?)<\/video(?=[>])>/, but it's not working properly.


Solution

  • I think that you need to substitute this two exact strings, and also the content inside this tags

    first the beginning and ending strings:

    "<video "
    
    "</video>"
    
    puts html_text.gsub("<video ","[[ video ]] ").gsub('</video>',"[[ video ]]")
    

    This should work

    irb(main):020:0> <div>
    [[ video ]]  controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn&#39;t support video.<br />
    Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
    </div>
    
    <div>test description</div>
    
    <div>
    <div>
    [[ video ]]  controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn&#39;t support video.<br />
    Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
    </div>
    
    <p>&nbsp;</p>
    </div>
    => true
    

    or with regular expressions

    puts html_text.gsub(/<\/?video[\s>]/, "[[ video ]]")
    
    <div>
    [[ video ]]controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn&#39;t support video.<br />
    Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
    </div>
    
    <div>test description</div>
    
    <div>
    <div>
    [[ video ]]controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn&#39;t support video.<br />
    Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
    </div>
    
    <p>&nbsp;</p>
    </div>
    

    Finally to remove all the inside this tag and all the content replace all. the problem is the \n character use this modifiers:

    /.*/m         multiline: . matches newline
    /.*/i         ignore case
    /.*/x         extended: ignore whitespace in pattern
    

    so finally if we join alltogether the regular expression is:

    puts html_text.gsub(/<video\s.*?<\/video>/mix, "[[ video ]]")
    

    result

    irb(main):043:0> <div>
    [[ video ]]
    </div>
    
    <div>test description</div>
    
    <div>
    <div>
    [[ video ]]
    </div>
    
    <p>&nbsp;</p>
    </div>
    => true