swiftswiftuiocrcoremlapple-vision

Japanese Vertical text recognition with VNRecognizeTextRequest not working


I'm using the Apple OCR capabilities provided by the Vision Framework to recognize text in images. While I've had great success with horizontal text in Japanese, Korean, and Chinese, I'm encountering issues with vertical text.

Problem: When trying to recognize vertical text in these languages, the OCR returns nil.

What I've Tried:

images examples

enter image description here enter image description here

Code Snippet:

  func ocr() {
    
    guard let image = UIImage(named: imageName) else {
        print("Failed to load image")
        return
    }
    
    guard let cgImage = image.cgImage else {
        print("Failed to get CGImage from UIImage")
        return
    }
    
    // Request handler
    let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .right, options: [:])
    
    let recognizeRequest = VNRecognizeTextRequest { (request, error) in
                    
        if let error = error {
            print("Failed to recognize text: \(error.localizedDescription)")
            return
        }
        
        // Parse the results as text
        guard let result = request.results as? [VNRecognizedTextObservation] else {
            print("No text found")
            return
        }
        
        let stringArray = result.compactMap { result in
            result.topCandidates(1).first?.string
        }
        
        
        let recognizedString = stringArray.joined(separator: "\n")
        
        
        let singleLineText = recognizedString
            .components(separatedBy: .newlines)
            .joined(separator: " ")

        
        DispatchQueue.main.async {
            self.recognizeText = singleLineText
        }
    }
    recognizeRequest.recognitionLanguages = ["ja"]

    recognizeRequest.revision = VNRecognizeTextRequestRevision3

    recognizeRequest.automaticallyDetectsLanguage = true
    
    recognizeRequest.recognitionLevel = .accurate
    recognizeRequest.usesLanguageCorrection = false
    
    
    do {
        try handler.perform([recognizeRequest])
    } catch {
        print("Failed to perform text recognition: \(error.localizedDescription)")
    }

}

Solution

  • After trying Apple Vision for 2 weeks, I discovered it does not support vertical text directly. Therefore, I sought alternative solutions and found that the Tesseract OCR library, a well-established tool developed by Google over 20 years ago, could potentially address this issue. Specifically, I found a trained model for vertical Japanese text (jpn_vert.traineddata) in the Tesseract repository.

    For iOS, I used the SwiftyTesseract library, which is more modern and worked well for my needs. Below are the steps I followed to get it up and running:

    Steps:

    1. Install SwiftyTesseract: Add SwiftyTesseract to your project using Swift Package Manager.
    2. Import SwiftyTesseract
    3. Download jpn_vert.traineddata from here
    4. Add the trained data to your project:

    Add This Extention

    public typealias PageSegmentationMode = TessPageSegMode
    
    public extension PageSegmentationMode {
      static let osdOnly = PSM_OSD_ONLY
      static let autoOsd = PSM_AUTO_OSD
      static let autoOnly = PSM_AUTO_ONLY
      static let auto = PSM_AUTO
      static let singleColumn = PSM_SINGLE_COLUMN
      static let singleBlockVerticalText = PSM_SINGLE_BLOCK_VERT_TEXT
      static let singleBlock = PSM_SINGLE_BLOCK
      static let singleLine = PSM_SINGLE_LINE
      static let singleWord = PSM_SINGLE_WORD
      static let circleWord = PSM_CIRCLE_WORD
      static let singleCharacter = PSM_SINGLE_CHAR
      static let sparseText = PSM_SPARSE_TEXT
      static let sparseTextOsd = PSM_SPARSE_TEXT_OSD
      static let count = PSM_COUNT
    }
    
    public extension Tesseract {
      var pageSegmentationMode: PageSegmentationMode {
        get {
          perform { tessPointer in
            TessBaseAPIGetPageSegMode(tessPointer)
          }
        }
        set {
          perform { tessPointer in
            TessBaseAPISetPageSegMode(tessPointer, newValue)
          }
        }
      }
    }
    

    Usage:

     func japaneseOCR() {
        let tesseract = Tesseract(languages: [ .custom("jpn_vert")])
        
        tesseract.pageSegmentationMode = .singleBlockVerticalText
        
        guard let image = UIImage(named: imageName) else {
            print("Failed to load image")
            return
        }
        
        guard let imageData = image.jpegData(compressionQuality: 1.0) else {
            print("Failed to load imageData")
            return
        }
    
        let result: Result<String, Tesseract.Error> = tesseract.performOCR(on: imageData)
    
        let result1 = try? result.get()
                
        self.recognizeText = result1 ?? ""
    }
    

    Result

    enter image description here enter image description here