regexurljustin.tv

please regex url request


I would like to know if anybody can help me with a regular expression problem. I want to write a regular expression to catch URLs similar to this URL:

www.justin.tv/channel_name_here

I have tried:

/justin\.tv\/(.*)

The problem I get is that when this channel goes live, sometimes the URL transforms to something like this:

www.justin.tv/channel_name_here#/w/45365675688

I can't catch this. :( Can anybody please help me with this? I just want to catch the channel name without the pound symbol and the rest of the URL.

Here are some example URLs:

www.justin.tv/winning_movies#/w/6347562128
http://www.justin.tv/cine_accion_hd16#/w/6347562128/18
http://www.justin.tv/fox_movies_hd1/

I would want to get:

winning_movies
cine_accion_hd16
fox_movies_hd1

Thanks in advance! :)


Solution

  • Short answer:

    (?<=justin\.tv\/)([^#\/]+)
    

    Long answer:

    Let's split this up into parts. Look at the back part first.

    ([^#\/]+)
    

    This delimits the string into parts that don't include either '#' or '/'. Now let's look at the first part.

    (?<=justin\.tv\/)
    

    The syntax "(?<=" followed by ")" is called positive lookbehind (this page has good examples and explanation of the different types of lookaround). Using a simple example:

    (?<=A)B
    

    The above example says "I want all 'B' that are immediately after an 'A'." Going to our big example, we're saying we want all parts (separated by '#' or '/') that are immediately after a part called "justin.tv/".

    Look here for an example of the expression in action.