I want to search for a regex in a pdf, and add annotations to it according, using the results from the regex. I have built a simple function that does this. As the amazing community (really amazing people who used their time helping me) posted I can I can use the decomposedStringWithCompatibilityMapping to search for the desired expression correctly in the pdf, but afterwards when I perform a pdf selection to find the bounds of it, I encounter a difference. I send you my code and some pictures.
func performRegex(regex:String, on pdfPage:PDFPage) {
guard let pdfString = pdfPage.string?.precomposedStringWithCanonicalMapping else { return }
guard let safeRegex = try? NSRegularExpression(pattern: regex, options: .caseInsensitive) else { return }
let results = safeRegex.matches(in: pdfString, options: .withoutAnchoringBounds, range: NSRange(pdfString.startIndex..., in: pdfString))
pdfPage.annotations.forEach { pdfPage.removeAnnotation($0)}
results.forEach { result in
let bbox = pdfPage.selection(for: result.range)?.bounds(for: pdfPage)
let annotation = PDFAnnotation(bounds: bbox!, forType: .highlight, withProperties: nil)
annotation.color = .yellow
annotation.contents = String(pdfString[Range(result.range, in:pdfString)!])
pdfPage.addAnnotation(annotation)
}
}
The problem is that when I do this and enter this expression [0-9] all my results are shifted:
While if I don't use precomposedStringWithCanonicalMapping, all my results are not shifted but I will encounter an error when I get a special character.
The problem (I suspect) is in this line of code.
let bbox = pdfPage.selection(for: result.range)?.bounds(for: pdfPage)
But I don't know any work arround for it.
Please if anyone can give me some help!
Thanks a lot
The only alternative I can think right now is to use the original string and fix the malformed ranges. Try like this:
var str = """
circular para poder realizar sus tareas laborales correspondientes a las actividades de comercialización de alimentos
"""
do {
let regex = try NSRegularExpression(pattern: ".", options: .caseInsensitive)
let results = regex.matches(in: str, options: .withoutAnchoringBounds, range: NSRange(location: 0, length: str.utf16.count))
var badrange: NSRange?
results.forEach { result in
guard let range = Range(result.range, in: str) else {
if badrange != nil {
badrange!.length += 1
if let range = Range(badrange!, in: str) {
let newStr = str[range]
print(newStr)
}
} else {
badrange = result.range
}
return
}
let newStr = str[range]
print(newStr)
badrange = nil
}
} catch {
print(error)
}