ms-wordms-officeline-breaks

Replace "Shift-Enter" line break with "Enter" in word document using Microsoft office API


I have a number of word documents that will be converted to HTML. It is required the paragraphs in the word documents should be converted to <p> elements.

After some tests with the Microsoft Office API's SaveAs method to convert the documents to the HTML, I realized the paragraphs with manual line breaks (break by "Shift-Enter") couldn't be placed in a separated <p> element, instead the paragraphs are grouped in a same <p> element.

In order to separate them, I have been trying to replace the "Shift-Enter" line breaks with the "Enter"/Carriage return before doing the conversion. However, I couldn't find a suitable way to do the line break replacement job. I have tried the WdLineEndingType parameter in the SaveAs method, but it seems not effective for the issue.


Solution

  • The ms-word office API provides a find function in the Range object, enabling to search and replace the strings.

    The following code is to find the manual line breaks("^l") with the carriage return("^p").

    Range r = oDoc.Content;
    r.WholeStory();
    r.Find.Execute("^l", ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, "^p", WdReplace.wdReplaceAll);
    

    Then use SaveAs to convert the word document to HTML, it will properly place each lines in <p> elements.