asp.net.netvb.netms-wordword-automation

vb.net Document Generation, Handling Greater Than & Less Than Symbols in Word


We are using TinyMCE editor to store Rich Text in a MS SQL database.

When using "<" & ">" symbols, TinyMCE converts these into the HTML escaped characters &lt ; and &gt ; For Example: <p>&lt;This is some test information then sometime I use this&gt;</p>

We are trying to export these symbols in a Microsoft Word document using document automation however the symbols do not appear in the document.

    Function PreFormatHTML(ByVal html As String) As String

        If String.IsNullOrEmpty(html) Then Return html

        html = WebUtility.HtmlDecode(html)

        Return html

    End Function

Dim SumRng As Word.Range = objWordDoc.Bookmarks.Item("bSummary").Range

SumRng.Text = PreFormatHTML(GeneralComponent.CheckReadNull(SqlReader.Item("Summary")))

This also doesn't work. I'm using Word 2013 and TinyMCE text editor.

Any suggestions?


Solution

  • Without seeing the full html I can only make an assumption however what I would suggest is use WebUtility.HtmlDecode:

    Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.

    This is how you would use it:

    html = WebUtility.HtmlDecode(html)
    

    With Word this is how I have tested:

    Dim s As String = "&lt;this is some text and I'm wondering what to do&gt;"
    
    Dim wrd As New Word.Application
    Dim doc As Word.Document = wrd.Documents.Add()
    Dim para As Word.Paragraph = doc.Content.Paragraphs.Add()
    
    para.Range.Text = WebUtility.HtmlDecode(s)
    

    This is what the text looks like in my Document:

    enter image description here

    Edited as per OP's comment:

    Dim s As String = "<p>&lt;This is some test information then sometime I use this&gt;</p>"
    
    Dim wrd As New Word.Application
    Dim doc As Word.Document = wrd.Documents.Add()
    Dim para As Word.Paragraph = doc.Content.Paragraphs.Add()
    
    para.Range.Text = WebUtility.HtmlDecode(s)
    

    This code produces the following output in my Document:

    enter image description here

    Edited as per OP's update to question:

    I have created a document called test.docx and added a bookmark called bSummary. I have done this in an attempt to replicate the OP's code.

    Dim s As String = "<p>&lt;This is some test information then sometime I use this&gt;</p>"
    
    Dim wrd As New Word.Application
    Dim doc As Word.Document = wrd.Documents.Open("C:\test.docx")
    
    Dim SumRng As Word.Range = doc.Bookmarks.Item("bSummary").Range
    SumRng.Text = PreFormatHTML(s)
    

    The output is the same as above. This leads me to think that whatever is passed into PreFormatHTML is not what you think it is. Is GeneralComponent.CheckReadNull(SqlReader.Item("Summary"))) passing into PreFormatHTML the following string; <p>&lt;This is some test information then sometime I use this&gt;</p>?

    OP has confirmed the HTML is returned from PrrFormatHTML as expected. The issues seems to be linked to the Document. It may be to do with the with version of Word Interop that the OP is using. I'm using Microsoft Word 16.0 Object Library whilst the OP is using Microsoft Word 15.0 Object Library.