I have a string that I need to split at 2 separate parts, but all I find is how to split the string using identifiers like "," and other punctuation.
string = "<p>The brown dog jumped over the... <a href="https://google.com" target="something">... but then splashed in the water<p>
hyperlink = re.split(r'(?=https)',string)
print(hyperlink[0])
In the example above, I need to extract just the url in the string "https://google.com" then print out. However, I can only find out how to split the string at "https", so everything past the url comes with it.
I hope this makes sense. After a bunch of searching and testing I can figure out how to do this.
There are many ways this can be achieved but a simple one is using find()
and then slicing.
find()
will find the starting position of a substring in a string. using this you can then slice there.
e.g.
string = '<p>The brown dog jumped over the... <a href="https://google.com" target="something">... but then splashed in the water<p>'
# Find where the URL starts
start_word = "https"
start_index = string.find(start_word)
# For URLs, we need to find where it ends - usually at a quote mark
end_index = string.find('"', start_index)
# Extract just the URL
result = string[start_index:end_index]
print(result)
Output:
"https://google.com"
The find()
method returns the index where the substring begins. Then, using these positions, we slice the string to extract just the section we want.