regexbashsedkeyboard-maestro

regex to convert www.evernote.com URL to use evernote protocol


I'm writing a simple script that will take URLs pointing to Evernote notes online, and convert them to the evernote:/// protocol. The regex I'm using matches and modifies the URL correctly when I try it out in a regex tester (I'm using Patterns for OS X). However, when I use it with sed, it just returns the original string.

echo "https://www.evernote.com/shard/s2/nl/227468/1875e55a-e512-4cf9-9b18-9e93c6a27359/" | sed 's#https?:_/_/www_.evernote_.com_/shard_/(..)_/nl_/(......)_/(.+_/)#evernote:_/_/_/view_/$2_/$1_/$3$3#'

Any idea why this isn't working? Thanks!

fort

[Edit: In case anyone's interested, this was for the AppleScript bit of a Keyboard Maestro macro:

set theURL to the clipboard set ENcode to "echo \"" & theURL & "\" | sed -E 's#https?://www.evernote.com/shard/(..)/nl/(.*)/(.+/)#evernote:///view/\\2/\\1/\\3\\3#' | pbcopy" do shell script ENcode

Thanks to @DreadPirateShawn for helping me fix the regex. ]


Solution

  • Using the extended regex flag -E, removing the underscores, and replacing each $1 pattern with \1 yields a functional regex here:

    $ echo "https://www.evernote.com/shard/s2/nl/227468/1875e55a-e512-4cf9-9b18-9e93c6a27359/" | sed -E 's#https?://www\.evernote\.com/shard/(..)/nl/(......)/(.+/)#evernote:///view/\2/\1/\3\3#'
    evernote:///view/227468/s2/1875e55a-e512-4cf9-9b18-9e93c6a27359/1875e55a-e512-4cf9-9b18-9e93c6a27359/
    

    (Confirmed on Ubuntu 12.04 and OS X.)

    If you don't use -E, then you also need to change s? to [s]? and escape the grouping parentheses:

    $ echo "https://www.evernote.com/shard/s2/nl/227468/1875e55a-e512-4cf9-9b18-9e93c6a27359/" | sed  's#http[s]*://www\.evernote\.com/shard/\(.*\)/nl/\(.*\)/\(.*/\)#evernote:///view/\2/\1/\3\3#'
    evernote:///view/227468/s2/1875e55a-e512-4cf9-9b18-9e93c6a27359/1875e55a-e512-4cf9-9b18-9e93c6a27359/
    

    In the latter example, I also replaced each (....)-type sequence with (.*) -- unless you're absolutely positive of the length of each sequence (and even then perhaps), the (.*) approach will be a bit more flexible.