pythonregexmarkdown

How to apply string method on regular expression in Python


I'm having a markdown file which is a little bit broken: the links and images which are too long have line-breaks in it. I would like to remove line-breaks from them.

Example:

from:

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

to:

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

As you can see in this snippet, I managed to match the all links and images with the right pattern: https://regex101.com/r/uL8pO4/2

But now, what is the syntax in Python to use a string method like string.trim() on what I have captured with regular expression?

For the moment, I'm stuck with this:

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their URLs
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

Edit: I updated the example to be more explicit about my problem.

Thank you for your answer


Solution

  • Alright, I finally found what I was searching. With the snippet below, I could capture a string with a regex and then apply the treatment on each of them.

    def remove_newlines(match):
        return "".join(match.group().strip().split('\n'))
    
    links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
    post['content'] = links_pattern.sub(remove_newlines, post['content'])
    

    Thank you for your answers and sorry if my question wasn't explicit enough.