javascriptregex

removing http:// or http:// and www


I am trying to build a url cleaner.

I am looking to get a list of urls and remove all https://, http://, www., etc. from the beginning as well as all text after the trailing /.

I have tried the following regex url.replace(/^https?\:\/\/www\./i, "").split('/')[0];

This works to a certain extent and outputs the following

"www.net-temps.com"
"www.toplanguagejobs.com"
"http:"
"peopleready.com"
"nationjob.com"
"http:"
"bluesteps.com"
"https:"
"theguardian.com"
"reddit.com"
"youtube.com"
"https:"
"pgatour.com"
"cultofmac.com"

from the following list:

'www.net-temps.com',
'www.toplanguagejobs.com',
'http://nychires.com/',
'http://www.peopleready.com/',
'https://www.nationjob.com/',
'http://nationaljobsonline.com/',
'https://www.bluesteps.com/',
'https://medium.freecodecamp.com/how-we-got-our-2-year-old-open-source-project-to-trend-on-github-8c25b0a6dfe9#.nl4985bjz',
'https://www.theguardian.com/uk/business',
'https://www.reddit.com/r/funny/comments/5qzkz4/my_captain_friend_sent_me_this_photo_saudi_prince/',
'https://www.youtube.com/watch?v=Bua8k_CcnuI',
'https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040',
'http://www.pgatour.com/fantasy.html',
'http://www.cultofmac.com/464645/apple-spaceship-campus-flyover/'

If I remove the /www\. from the regex this works well and removes all https: etc., but I'd also like to remove the www. if it's there regardless of https:

This is what i have coded so far

https://jsfiddle.net/xba5x9ro/1/

In the future once this is sorted. I would like to take a list of urls from a text area run makeDomainBeautiful and output to another textarea but thought I'd get this working first.


Solution

  • /^(?:https?:\/\/)?(?:www\.)?/i where both https:// and www. should be optional (?) and non-capturing groups ((?:...)).

    var url = prompt("url: ");
    
    url = url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0];
    
    alert("url: " + url);