I have various strings which actually contain some html like content in them. The links in this do not have surrounding <a>
and </a>
tags. So, I need to find those links and manually add those anchor tags.
Part of my solution involves using NSDataDetector
with type NSTextCheckingResult.CheckingType.link.rawValue)
:
let str = """
<div>
<p>Hello world, here's some links!</p>
<p>[1] https://news.ycombinator.com</p>
<p>[2] https://google.com</p>
</div>
"""
if let detector = try? NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue) {
let matches = detector.matches(in: str, options: [], range: NSRange(str.startIndex..., in: str))
for match in matches {
if let range = Range(match.range, in: str) {
let url = str[range]
print("URL: \(url), \(match.url)")
}
}
}
This is however also picking up the trailing </p>
after the link.
The output of above is:
URL: https://news.ycombinator.com</p>, Optional(https://news.ycombinator.com%3C/p%3E)
URL: https://google.com</p>, Optional(https://google.com%3C/p%3E)
As far as I know, </p>
is not valid in links. Yet, it's being picked up.
Is this a bug?
Is it possible to prevent this?
NSDataDetector
will try to extract the url from the plain natural language text.
Apple docs NSDataDetector
is very specific, especially the last Note
.
When using NSDataDetector
you should convert HTML to plain text first. Then extract the urls.
Example code:
let str = """
<div>
<p>Hello world, here's some links!</p>
<p>[1] https://news.ycombinator.com</p>
<p>[2] https://google.com</p>
</div>
"""
print("----> Using NSDataDetector")
// convert HTML to plain text
if let data = str.data(using: .utf8),
let attributedString = try? NSAttributedString(data: data,
options: [.documentType: NSAttributedString.DocumentType.html],
documentAttributes: nil) {
let plainText = attributedString.string
print("plainText: \n \(plainText)")
// use NSDataDetector on plain text
if let detector = try? NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue) {
let matches = detector.matches(in: plainText, options: [], range: NSRange(plainText.startIndex..., in: plainText))
for match in matches {
if let url = match.url {
print("URL: \(url.absoluteString)")
}
}
}
}
print("\n----> Using Regex")
let pattern = #"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])"#
do {
let regex = try Regex(pattern)
let matches = str.ranges(of: regex)
for range in matches {
let match = str[range]
print(match) // <--- here
}
} catch {
print("Failed to create regex")
}