Is it possible to find outlook specific markup via Capybara/Nokogiri ?
Given the following markup (erb <% %>
tags are processed into regular HTML)
...
<div>
<!--[if gte mso 9]>
<v:rect
xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
>
<v:fill type="tile"
src="<%= avatar_background_url.split('?')[0] %>"
color="<%= background_color %>" />
<v:textbox inset="0,0,0,0">
<![endif]-->
<div>
How can I get the list of <v:fill ../>
tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)
I have tried the following
doc.xpath('//v:fill')
*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill
DO I need to somehow register the vml namespace ?
EDIT - following @ThomasWalpole approach
doc.xpath('//comment()').each do |comment_node|
vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
if vml_node_match
original_image_uri = URI.parse(vml_node_match['url'])
vml_tag = vml_node_match[0]
handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
end
My handle_vml_image_replacement
then ends up calling the following replace_comment_image_src
def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
new_url = new_url.split('?').first # VML does not support URL with query params
puts "Replacing comment src URL in #{comment} by #{new_url}"
node.content = node.content.gsub(old_url, new_url)
end
But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?
Here's the final code that I used for my email interceptor, thanks to @Thomas Walpole and @sschmeck for help along the way.
My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus
doc.xpath('//comment()').each do |comment_node|
# Note : cannot capture beginning of tag, since it might span across several lines
src_attr_match = /.*src=\"(?<url>http[s]?\:[^"]*)"[^>]*\/>/.match(comment_node)
next unless src_attr_match
original_image_uri = URI.parse(src_attr_match['url'])
handle_comment_image_replacement(original_image_uri, comment_node)
end
WHich is later calling (after picking an url replacement strategy depending on source image type) :
def self.replace_comment_image_src(node:, old_url:, new_url:)
new_url = new_url.split('?').first
node.native_content = node.content.gsub(old_url, new_url)
end