I have a number of word documents that will be converted to HTML. It is required the paragraphs in the word documents should be converted to <p>
elements.
After some tests with the Microsoft Office API's SaveAs method to convert the documents to the HTML, I realized the paragraphs with manual line breaks (break by "Shift-Enter") couldn't be placed in a separated <p>
element, instead the paragraphs are grouped in a same <p>
element.
In order to separate them, I have been trying to replace the "Shift-Enter" line breaks with the "Enter"/Carriage return before doing the conversion. However, I couldn't find a suitable way to do the line break replacement job. I have tried the WdLineEndingType parameter in the SaveAs method, but it seems not effective for the issue.
The ms-word office API provides a find function in the Range object, enabling to search and replace the strings.
The following code is to find the manual line breaks("^l") with the carriage return("^p").
Range r = oDoc.Content;
r.WholeStory();
r.Find.Execute("^l", ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, "^p", WdReplace.wdReplaceAll);
Then use SaveAs to convert the word document to HTML, it will properly place each lines in <p>
elements.