I'm trying to remove the html code that wraps the RichTextField content, I thought I could do it using "raw_data" but that doesn't seem to work. I could use regex to remove it but there must be a wagtail/django way to do this?
for block in post.faq.raw_data:
print(block['value']['answer'])
Outputs:
<p data-block-key="y925g">The time is almost 4.30</p>
Expected output (just the raw text):
The time is almost 4.30
StructBlock:
class FaqBlock(blocks.StructBlock):
question = blocks.CharBlock(required=False)
answer = blocks.RichTextBlock(required=False)
You can do this in Beautiful Soup easily.
soup = BeautifulSoup(unescape(html), "html.parser")
inner_text = ' '.join(soup.findAll(text=True))
In your case, html = value.answer which you can pass into a template_tag
EDIT: example filter:
from bs4 import BeautifulSoup
from django import template
from html import unescape
register = template.Library()
@register.filter()
def plaintext(richtext):
return BeautifulSoup(unescape(richtext), "html.parser").get_text(separator=" ")
There's the get_text() operator in BeautifulSoup which takes a separator - it does the same as the join statement I wrote earlier. The default separator is null string which joins all the text elements together without a gap.
<h3>Rich Text</h3>
<p>{{ page.intro|richtext }}</p>
<h3>Plain Text</h3>
<p>{{ page.intro|plaintext }}</p>
If you want to retain line breaks, it needs a bit more parsing to replace block elements with a \n
. The streamvalue.render_as_block()
method does that for you, but there's no method like this for RichTextField
since it's just a string. You can find code examples to do this if you need.