qtqmlrichtextqtextdocumentqtextcursor

How to get HTML text without meta information from component QTextDocument


Description

I created a TextArea component in QML, and similar to this example, I created a DocumentHandler class based on a pointer to a QQuickTextDocument, which is taken through the textDocument property. I need this in order to be able to format the text, that is, make it bold, underlined, italic, strikeOut etc.

What I need

I need to get a text where the formatted parts will be presented as HTML tags.

e.g. Bold text ultimately I would like to get in the form <b>Bold text</b>. Or for example Bold and italic text I would like to get in the form <b><i>Bold and italic text</i></b> (the order in which the tags are placed does not matter).

What I tried

I tried to use the toHtml() function, but this function does not suit me because:

  1. It generates a lot of unnecessary information that I don't need. For example for Bold text it returned the following result:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html><head><meta name="qrichtext" content="1" /><style type="text/css">
p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Roboto'; font-size:14px; font-weight:400; font-style:normal;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-weight:600;">Bold text</span></p></body></html>
  1. The usual tags that I need to represent the text (<b>, <i> etc.), this function is formed in the form of the style attribute of the <span> tag. So it changes the bold with this line: <span style=" font-weight:600;">.

Solution

  • Description

    If I understood correctly, at the moment there is no way to get formatted text with HTML tags without meta information that is generated by the QTextDocument using the toHtml() function. Therefore, I decided to manually do this work using the QTextCursor class.

    Code

    I have a structure that provides information about tag:

    struct Tag
    {
        Tag() = default;
        Tag(const QString& openTag,
            const QString& closeTag,
            const std::function<bool(const QTextCursor& cursor)>& canBeOpened);
    
        QString getOpenTag() const
        {
            return m_openTag;
        }
    
        QString getCloseTag() const
        {
            return m_closeTag;
        }
    
        bool canBeOpened(const QTextCursor& cursor) const
        {
            return m_canBeOpened(cursor);
        }
    
    private:
        QString m_openTag;
        QString m_closeTag;
        std::function<bool(const QTextCursor&)> m_canBeOpened;
    };
    

    And I have a std::vector of such structures which I initialize as follows:

    m_tags{ { "<b>", "</b>", [](const QTextCursor& cursor) { return cursor.charFormat().fontWeight() == QFont::Bold; } },
            { "<i>", "</i>", [](const QTextCursor& cursor) { return cursor.charFormat().fontItalic(); } },
            { "<u>", "</u>", [](const QTextCursor& cursor) { return cursor.charFormat().fontUnderline(); } },
            { "<s>", "</s>", [](const QTextCursor& cursor) { return cursor.charFormat().fontStrikeOut(); } } }
    

    And the most important thing is the getFormattedText() function that uses this vector of Tag objects to return the formatted text. The main idea is to manually place tags in plain text, that is, the opening tag is placed where the formatting begins, and the closing tag is where it ends. Information about where in the text what formatting is used can be taken from the QTextCursor class, which object we can create based on the QTextDocument class. As a result, we have the following function:

    QString getFormattedText()
    {
        auto cursor{ textCursor() };
        if (!cursor.isNull())
        {
            int offset{};
            auto result{ cursor.document()->toPlainText() };
            auto currentTextFormat{ getTextFormat() };
            for (int i{}; i < cursor.document()->characterCount(); ++i)
            {
                cursor.setPosition(i);
    
                const auto localTextFormat{ getTextFormat(cursor) };
                if (currentTextFormat != localTextFormat)
                {
                    const auto closedFormat{ getClosedFormat(currentTextFormat) };
                    const auto openedFormat{ getOpenedFormat(localTextFormat) };
                    result.insert(i - (i > 0 ? 1 : 0) + offset, closedFormat + openedFormat);
                    offset += closedFormat.size() + openedFormat.size();
                    currentTextFormat = localTextFormat;
                }
            }
            result += getClosedFormat(currentTextFormat);
            return result.replace("\n", "<br>");
        }
        return {};
    }
    

    The logic associated with the currentTextFormat and localTextFormat variables is necessary in order to "timely" close one combination of formats and open a new one. This combination of formats was named as:

    using TextFormat = std::vector<std::pair<FontFormat, bool>>;
    

    Where FontFormat is:

    enum class FontFormat
    {
        Bold,
        Italic,
        Underline,
        Strikethrough
    };
    

    Functions for getting a TextFormat:

    TextFormat getTextFormat()
    {
        TextFormat textFormat;
        for (const auto& format : m_formats)
        {
            textFormat.push_back({ format.first, false });
        }
        return textFormat;
    }
    
    TextFormat getTextFormat(const QTextCursor& cursor)
    {
        TextFormat textFormat;
        for (const auto& format : m_formats)
        {
            textFormat.push_back({ format.first, format.second.canBeOpened(cursor) });
        }
        return textFormat;
    }
    

    Functions for getting text interpretation of TextFormat:

    QString getOpenedFormat(const TextFormat& textFormat)
    {
        const auto append = [](QString& result, const Tag& tag) {
            result.push_back(tag.getOpenTag());
        };
        return getFormat(textFormat, append);
    }
    
    QString getClosedFormat(const TextFormat& textFormat)
    {
        const auto append = [](QString& result, const Tag& tag) {
            result.prepend(tag.getCloseTag());
        };
        return getFormat(textFormat, append);
    }
    
    QString getFormat(const TextFormat& textFormat, const std::function<void(QString&, const Tag&)>& append)
    {
        QString result;
        for (const auto& format : textFormat)
        {
            if (format.second)
            {
                const auto fndFontFormat{ m_formats.find(format.first) };
                if (fndFontFormat != m_formats.end())
                {
                    append(result, fndFontFormat->second);
                }
            }
        }
        return result;
    }
    

    For example, there is such text: abc. Each of the letters of this text has a different combination of formats, and when iterating from one of the letters to another, it is necessary to take this into account, closing the old combination and opening a new one.

    Thus abc will be converted as: <b>a</b><b><s>b</b></s><s>c</s>.