I use Python-Markdown to render user generated content. I'd like to change pictures from external sources to links.
So i have a list of storages:
storages = ['foo.com', 'bar.net']
and i need to replace
data:image/s3,"s3://crabby-images/b17b0/b17b0e0cacf6b0f5d57bf7baadd61b8e920f5ca9" alt=""
to something like:
[http://external.com/image.png](http://external.com/image.png)
if host not in storages.
I tried to edit markdown-text before saving to database but it's not good solution as user may want to edit his data and discover data was modified. So i want to do that replacement on render.
One solution to your question is demonstrated in this tutorial:
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
from urllib.parse import urlparse
class InlineImageProcessor(Treeprocessor):
def __init__(self, md, hosts):
self.md = md
self.hosts = hosts
def is_unknown_host(self, url):
url = urlparse(url)
return url.netloc and url.netloc not in self.hosts
def run(self, root):
for element in root.iter('img'):
attrib = element.attrib
if self.is_unknown_host(attrib['src']):
tail = element.tail
element.clear()
element.tag = 'a'
element.set('href', attrib.pop('src'))
element.text = attrib.pop('alt')
element.tail = tail
for k, v in attrib.items():
element.set(k, v)
class ImageExtension(Extension):
def __init__(self, **kwargs):
self.config = {'hosts' : [[], 'List of approved hosts']}
super(ImageExtension, self).__init__(**kwargs)
def extendMarkdown(self, md):
md.treeprocessors.register(
InlineImageProcessor(md, hosts=self.getConfig('hosts')),
'inlineimageprocessor',
15
)
Testing it out:
>>> import markdown
>>> from image-extension import ImageExtension
>>> input = """
... data:image/s3,"s3://crabby-images/a24f6/a24f6c02e4c46ad62c19c89bc29d762725ac8c2d" alt="a local image"
...
... data:image/s3,"s3://crabby-images/6163f/6163f09ac470dd7d34061aa26f24eaafc263a44b" alt="a remote image"
...
... data:image/s3,"s3://crabby-images/12bf0/12bf008bb16a5d3d3ac8091057d402095f57ba42" alt="an excluded remote image"
... """
>>> print(markdown.markdown(input, extensions=[ImageExtension(hosts=['example.com'])]))
<p><img alt="a local image" src="/path/to/image.jpg"/></p>
<p><img alt="a remote image" src="http://example.com/image.jpg"/></p>
<p><a href="http://exclude.com/image.jpg">an excluded remote image</a></p>
Full disclosure: I am the lead developer of Python-Markdown. We needed another tutorial which demonstrated some additional features of the extension API. I saw this question and thought it would make a good candidate. Therefore, I wrote up the tutorial, which steps through the development process to end up with the result above. Thank you for the inspiration.