regexversiondigitspoints

grep Howto extract a link only from a webseite


I was searching a lot, but nothing really helped me to find a solution to my question. I am still learning regex and have some success, but in this case I can't get to the solution I want.

I am writing scripts to actualize installation packets for our installation-server. It shall download the newest setup.exe and pack a new package, so it can deployed to the clients.

I try to download a website and to find the right link in it:

wget --no-check-certificate https://www.thunderbird.net/en-US/thunderbird/all/ -q -O- | grep -o https://download\.mozilla\.org/\?product=thunderbird-.*-SSL\&os=win64\&lang=de

It shall ignore the version-number and it works well on my way to the final solution. The result is:

https://download.mozilla.org/?product=thunderbird-91.6.0-SSL&os=win64&lang=de
https://download.mozilla.org/?product=thunderbird-91.6.0-msi-SSL&os=win64&lang=de

But what I need is just the https://download.mozilla.org/?product=thunderbird-91.6.0-SSL&os=win64&lang=de

I know I can regex the "91.6.0" but how correctly and what if the 91 becomes a >100 and what if one version is the 95.4.0.2 (for example) ?

Thanks for help.

Denise


Solution

  • You may exclude a hyphen in between thunderbird- and -SSL with a [^-]* negated bracket expression:

    https://download\.mozilla\.org/\?product=thunderbird-[^-]*-SSL\&os=win64\&lang=de
    

    See the regex demo.

    You may also match digits or dots only with [0-9.]*:

    https://download\.mozilla\.org/\?product=thunderbird-[0-9.]*-SSL\&os=win64\&lang=de
    

    See this regex demo.