JEditorPane text/html Elements get HTML inside element

I'm creating a HTML editor in Java Swing. It uses a JEditorPane with text/html MIME type. I have a situation where I have the following HTML structure:

<body>
    <p>This is a <b>BOLD</b> word in a sentence</p>
</body>

When the cursor is placed in that sentence, and someone clicks a "LIST" button, the HTML gets modified by creating a new list with the paragraph that contained the cursor as the first list item. Like so:

<body>
    <ol>
        <li>
            <p>This is a <b>BOLD</b> word in a sentence</p>
        </li>
    </ol>
</body>

I can get this working so far that the list element is created, but I can't make it so that the bold tags are inserted in the correct position inside the new list. In other words I can create the list item, but the bold tags are gone.

I need some way to get the inner or outerHTML of an Element object, in this case the paragraph object, so that I can copy the contents in their entirety, including the bold tags. So far I can only copy the text inside the

tags, which doesn't include the bold tags.

Here is my code so far. this is inside an extended editor pane object. htmlDoc_ is the HTMLDocument for the editor pane.

public void toggledListButton() {
    
    // turning the paragraph into a list
    
    // get the paragraph element, cursor should always be inside
    // a paragraph somewhere
    Element elem = htmlDoc_.getParagraphElement( this.getCaretPosition() );
    
    int caretPos = this.getCaretPosition();
    int elemStart = elem.getStartOffset();
    int elemEnd = elem.getEndOffset();

    String elemText = "" ;
    try {
        elemText = htmlDoc_.getText(elemStart, elemEnd - elemStart);
    } catch (BadLocationException e1) {
        e1.printStackTrace();
    }

    try {
        htmlDoc_.setOuterHTML(elem, "<ol><li><p>" + elemText + "</p></li></ol>");
    } catch (BadLocationException e1) {
        e1.printStackTrace();
    } catch (IOException e1) {
        e1.printStackTrace();
    }
    
    // amount of text doesnt change, so we can just set the caretPos where it was
    this.setCaretPosition(caretPos);
    this.requestFocusInWindow();
    
}

If I could somehow get the inner HTML of the "elem" Element, I think I would have what I need to insert into the new list. Either that or maybe pass the element to JSoup and extract the HTML that way, but I can't figure out how to pass the Element into JSoup.

EDIT-----------------

As per the comment below about iterating through elements, I made this change to take "elem" variable try and loop through each child in the paragraph and build the html of the paragraph that way. The problem is it doesnt seem to detect the tags as a seperate element, it only detects 3 text/Leaf elements.

    String paragraphHTML = "";
    for (int i = 0; i < elem.getElementCount(); i++) {
      
        Element child = elem.getElement(i);
        if (child.isLeaf()) {
            try {
                paragraphHTML += child.getDocument().getText(0, child.getDocument().getLength());
            } catch (BadLocationException e) {
                e.printStackTrace();
            }
            
        } else {
            paragraphHTML += "<" + child.getName() + ">";   
        }
        
    }
    System.out.println("paragraphHTML=" + paragraphHTML);

ParagraphHTML is output as just the text excluding the tags. How would I detect the tags as well? Thanks

Solution

If I could somehow get the inner HTML of the "elem" Element,

You can use the HTMLEditorKit.write(...) method to write out the text/tags of the paragraph at the caret position:

import java.awt.*;
import java.io.*;
import java.util.*;
import javax.swing.*;
import javax.swing.event.*;
import javax.swing.text.*;

public class EditorPaneExtract extends JPanel implements CaretListener
{
    private JEditorPane editor;
    private JTextArea partial;
    private JLabel extracted;

    public EditorPaneExtract() throws Exception
    {
        setLayout( new BorderLayout() );

        String text = "<html><head><title>Title</title><body><pre>123456789</pre><p>Line one with <b>bold</b> text</p><p>Line two with <i>italic</i> text</p></body></html>";

        editor = new JEditorPane();
        editor.setContentType( "text/html" );
        editor.setText( text );
        editor.addCaretListener( this );

        JScrollPane scrollPane = new JScrollPane( editor );
        scrollPane.setPreferredSize( new Dimension(400, 120) );
        add(scrollPane, BorderLayout.PAGE_START);

        JTextArea full = new JTextArea(20, 25);
        full.setEditable( false );
        add(new JScrollPane(full), BorderLayout.LINE_START);

        full.setText( editor.getText() );

        partial = new JTextArea(20, 25);
        partial.setEditable( false );
        add(new JScrollPane(partial), BorderLayout.LINE_END);

        extracted = new JLabel(" ");
        add(extracted, BorderLayout.PAGE_END);
    }

    @Override
    public void caretUpdate(CaretEvent e)
    {
        try
        {
            int offset = editor.getCaretPosition();

            StyledDocument doc = (StyledDocument)editor.getDocument();
            Element paragraph = doc.getParagraphElement(offset);
            int start = paragraph.getStartOffset();
            int end = paragraph.getEndOffset();

            StringWriter writer = new StringWriter();
            editor.getEditorKit().write(writer,  editor.getDocument(), start, end - start);

            partial.setText( writer.toString() );

            StringBuilder sb = new StringBuilder( writer.toString() );
            sb.delete(sb.length() - 19, sb.length() -1);
            sb.delete(0, 20);

            extracted.setText( sb.toString() );
         }
         catch (Exception e1)
         {
             e1.printStackTrace();
         }
    }

    public static void main(String[] args) throws Exception
    {
        JFrame frame = new JFrame();
        frame.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
        frame.add( new EditorPaneExtract() );
        frame.pack();
        frame.setLocationRelativeTo( null );
        frame.setVisible(true);
    }
}

In the above code:

the top component is the editor pane
the left component is the full HTML text
the right component is the text of the extracted HTML. Note the "html" and "body" tags are always included
the bottom components is the extracted HTML without the "html" and "body" tags.

Just move the caret from line to line to see the difference.