I'm using the Apple OCR capabilities provided by the Vision Framework to recognize text in images. While I've had great success with horizontal text in Japanese, Korean, and Chinese, I'm encountering issues with vertical text.
Problem: When trying to recognize vertical text in these languages, the OCR returns nil.
What I've Tried:
images examples
Code Snippet:
func ocr() {
guard let image = UIImage(named: imageName) else {
print("Failed to load image")
return
}
guard let cgImage = image.cgImage else {
print("Failed to get CGImage from UIImage")
return
}
// Request handler
let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .right, options: [:])
let recognizeRequest = VNRecognizeTextRequest { (request, error) in
if let error = error {
print("Failed to recognize text: \(error.localizedDescription)")
return
}
// Parse the results as text
guard let result = request.results as? [VNRecognizedTextObservation] else {
print("No text found")
return
}
let stringArray = result.compactMap { result in
result.topCandidates(1).first?.string
}
let recognizedString = stringArray.joined(separator: "\n")
let singleLineText = recognizedString
.components(separatedBy: .newlines)
.joined(separator: " ")
DispatchQueue.main.async {
self.recognizeText = singleLineText
}
}
recognizeRequest.recognitionLanguages = ["ja"]
recognizeRequest.revision = VNRecognizeTextRequestRevision3
recognizeRequest.automaticallyDetectsLanguage = true
recognizeRequest.recognitionLevel = .accurate
recognizeRequest.usesLanguageCorrection = false
do {
try handler.perform([recognizeRequest])
} catch {
print("Failed to perform text recognition: \(error.localizedDescription)")
}
}
After trying Apple Vision for 2 weeks, I discovered it does not support vertical text directly. Therefore, I sought alternative solutions and found that the Tesseract OCR library, a well-established tool developed by Google over 20 years ago, could potentially address this issue. Specifically, I found a trained model for vertical Japanese text (jpn_vert.traineddata
) in the Tesseract repository.
For iOS, I used the SwiftyTesseract library, which is more modern and worked well for my needs. Below are the steps I followed to get it up and running:
Steps:
Import SwiftyTesseract
jpn_vert.traineddata
from hereCreate a folder named tessdata
.
Add jpn_vert.traineddata
to this folder.
Drag the tessdata folder to your Xcode project and select Create folder references.
on Edit Scheme
then Run
inside Environment Variables
add
name: TESSDATA_PREFIX
value: $(PROJECT_DIR)/tessdata
Add This Extention
public typealias PageSegmentationMode = TessPageSegMode
public extension PageSegmentationMode {
static let osdOnly = PSM_OSD_ONLY
static let autoOsd = PSM_AUTO_OSD
static let autoOnly = PSM_AUTO_ONLY
static let auto = PSM_AUTO
static let singleColumn = PSM_SINGLE_COLUMN
static let singleBlockVerticalText = PSM_SINGLE_BLOCK_VERT_TEXT
static let singleBlock = PSM_SINGLE_BLOCK
static let singleLine = PSM_SINGLE_LINE
static let singleWord = PSM_SINGLE_WORD
static let circleWord = PSM_CIRCLE_WORD
static let singleCharacter = PSM_SINGLE_CHAR
static let sparseText = PSM_SPARSE_TEXT
static let sparseTextOsd = PSM_SPARSE_TEXT_OSD
static let count = PSM_COUNT
}
public extension Tesseract {
var pageSegmentationMode: PageSegmentationMode {
get {
perform { tessPointer in
TessBaseAPIGetPageSegMode(tessPointer)
}
}
set {
perform { tessPointer in
TessBaseAPISetPageSegMode(tessPointer, newValue)
}
}
}
}
Usage:
func japaneseOCR() {
let tesseract = Tesseract(languages: [ .custom("jpn_vert")])
tesseract.pageSegmentationMode = .singleBlockVerticalText
guard let image = UIImage(named: imageName) else {
print("Failed to load image")
return
}
guard let imageData = image.jpegData(compressionQuality: 1.0) else {
print("Failed to load imageData")
return
}
let result: Result<String, Tesseract.Error> = tesseract.performOCR(on: imageData)
let result1 = try? result.get()
self.recognizeText = result1 ?? ""
}
Result