pythonawksedmarkdownmarko

Prepend a url to all relative image links in a markdown document


I have a bunch of markdown documents with a mix of relative and absolute image destinations. e.g.

This is some text

![optional caption](/sub/folder/image.png)

And more text

![](https://example.com/cool_image.png)

I want to prepend a URL to each of the relative images, e.g. to change the above to

This is some text

![optional caption](https://some-image-host/image-host-subpath/sub/folder/image.png)

And more text

![](https://example.com/cool_image.png)

but preferably without hard-coding /sub/folder/ into the replace script (which is how I currently do it).

Is there a clever way to do this with awk or sed or is that a bad idea due to markdown having more edge cases than one expects?

I made some progress with https://pypi.org/project/marko/, e.g.

import marko
with open("myfile.md") as f: s = f.read()

doc = marko.inline.parser.parse_inline(s)

for i, e in eumerate(doc):
    if type(e) == marko.inline.Image:
        if not e.dest.startswith("http"):
            doc[i].dest = "https://some-image-host/image-host-subpath/" + doc[i].dest

which finds all the images and updates the destination of each relative image with the URL, but I'm not quite sure how to render this list of inline elements back into a markdown string again, and I figured I would post here first before re-inventing the wheel in case there is a much simpler way of doing this.

TIA for any help.


Solution

  • This command will do it without altering the original file in-place:

    sed 's_\(^!\[.*\](\)_\1https://some-image-host/image-host-subpath_' <input_file
    

    Once you've confirmed it's what you want, you just need to add -i after the sed and before the 's_... and also remove the < before input_file:

    sed -i 's_\(^!\[.*\](\)_\1https://some-image-host/image-host-subpath_' input_file
    

    The way the command works is as follows:

    A simpler way would have been to simply replace the ]( part of the line with ])your_url_here:

    sed 's_](_](https://some-image-host/image-host-subpath/_' <test
    

    but it's possible that the ]( combination might be found on other lines of your files and so I opted for the stronger test ^!\[.*\]( which only matches lines beginning with ![ and has some stuff before ](.