I have used pandoc with the option --self-contained to create HTML documents where images are embedded in the HTML code as base64.
The image is included in the IMG tag like this (where I have replaced the long string of base64-characters with a placeholder:
<IMG src="data:image/png;base64,<<base64-coded characters here>>" width=672">
Now, I'd like to extract such images, i.e. do the reverse where base64-coded data are replaced by references to files and the data converted to ordinary PNG or JPEG files that are saved on disk.
I was hoping to use pandoc to do this conversion, but I could not find an option for this in pandoc, nor have I found any other software that does it. Ideally, the solution should be shell/script-type that can easily be included in a longer toolchain.
You can use pandoc with the --extract-media
option. The images will be written to the supplied directory and the base64 URLs will be replaced with references to those files.
E.g.
pandoc --from=html YOUR_FILE.html --extract-media=images