pythonregexwindowssed

Surrounding whitespace-separated URLs with quotes using sed


Problem

I was trying to get sed command to do the same thing I could do with Python regex flavour, but I encountered some problems

Python regex example: (tested it on regex101 and it was working fine)

find: (https.*?)

replace: "\1"

Unsuccessful code:

sed 's/\(https.*?\)[:space:]/\"\1\"/g' .\elenco.txt

elenco.txt file:

https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s https://www.youtube.com/watch?v=vviniZjvDQs  https://www.youtube.com/watch?v=Ih7qgkyo_oo  https://www.youtube.com/watch?v=X6UEDpwI3HI  https://www.youtube.com/watch?v=nShgaRMNlLw  https://www.youtube.com/watch?v=nd_jN-C_Juw  https://www.youtube.com/watch?v=aOtqox2uB3Y

Expected output:

"https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s" "https://www.youtube.com/watch?v=vviniZjvDQs"  "https://www.youtube.com/watch?v=Ih7qgkyo_oo"  "https://www.youtube.com/watch?v=X6UEDpwI3HI"  "https://www.youtube.com/watch?v=nShgaRMNlLw"  "https://www.youtube.com/watch?v=nd_jN-C_Juw"  "https://www.youtube.com/watch?v=aOtqox2uB3Y"

Actual output:

"https://www.youtube.com/watch?"pp=desktop&v=Ot34P0yyQqI&t=984s https://www.youtube.com/watch?v=vviniZjvDQs  https://www.youtube.com/watch?v=Ih7qgkyo_oo  https://www.youtube.com/watch?v=X6UEDpwI3HI  https://www.youtube.com/watch?v=nShgaRMNlLw  https://www.youtube.com/watch?v=nd_jN-C_Juw  https://www.youtube.com/watch?v=aOtqox2uB3Y

Info

OS

Name: Microsoft Windows 11 Home Version: 10.0.26100 N/D build 26100

installed sed through winget install bmatzelle.Gow

I've always avoided using POSIX regex etc, as I found it unnecessarily complicated / limited compared to using perl/python etc. and the regex flavour available there.

Any other options than to install Perl/Python? 200MB for StrawberryPerl (Perl on Windows) seems to be quite overkill and useless bloat just to have access to perl flavour regex, and sed unlike perl doen't support 'easy' regex...

https://askubuntu.com/questions/1050693/sed-with-pcre-like-grep-p


Solution

  • Ahoy!

    Its pretty trivial to do something like this in Perl. I donno 200mb these days seems pretty small. You can even do this with Windows Subsystems for Linux or WSL. Install WSL, run bash from a command prompt, then sudo apt install perl. I use WSL from the command line in Windows all the time. Its very small and incredibly useful. PCRE regular expressions are really useful because they are portable, and you dont have to rewrite your regular expressions for every minor wrinkle in every language.

    Basically look for anything not a space, until you find a space or end of line. Backreference all that and put quotes around it in a global match.

    Here is the code Golfed at 21 characters...

    $ perl -pe 's/(\S+)( |$)+/"\1" /g' elenco.txt 
    "https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s" "https://www.youtube.com/watch?v=vviniZjvDQs" "https://www.youtube.com/watch?v=Ih7qgkyo_oo" "https://www.youtube.com/watch?v=X6UEDpwI3HI" "https://www.youtube.com/watch?v=nShgaRMNlLw" "https://www.youtube.com/watch?v=nd_jN-C_Juw" "https://www.youtube.com/watch?v=aOtqox2uB3Y"
    

    The output matches your expected output. If you can install Perl it is probably worth your time to do so. It gets really difficult to manage regular expressions across different languages unless they are all PCRE. IMO Perl is better in every way than both Sed and Awk.

    To modify the original input file with the quoted URLs you could run something like this...

    $ perl -i -pe 's/(\S+)( |$)+/"\1" /g' elenco.txt  
    

    However this is dangerous during testing. The original file will be lost unless you have backups. It is probably safer to run something like this...

    $ perl -pe 's/(\S+)( |$)+/"\1" /g' elenco.txt  > updated_elenco.txt
    

    Good Luck!