parsingurlgoogle-bigqueryurlparse

Grab the values from url that is in between specific characters BigQuery


I need to parse urls in order to grab a value that comes after .com/ AND before the next / character. My data looks like this:

url
https://www.delish.com/food-news/news/jdhgkjdf/100-years-of-christmas
https://www.delish.com/food-news/news/100-years-of-christmas

The desired output is:

new_string
food-news
food-news

I have tried the following:

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 4)] AS new string

But because the URLs are not consistent, sometimes it grabs food_news, sometimes it grabs www.delish.com, that's why offset is not working in this particular case.


Solution

  • Use below

    regexp_extract(url, net.host(url) || r'/([^/]+)')