I found this tutorial https://www.youtube.com/watch?v=KE4xEzFGSU8 here and tried to follow the instructions I git cloned both tesseract and tesstrain
I added the heb.training_text from here https://github.com/HayekZH/LangData_Tesseract/tree/master/heb I made the folders and ran the python script that worked but the Training command:
TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Apex START_MODEL= eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000
doesn't even seem to be supported. I need this font for rashi trained https://github.com/googlefonts/mekorot
Edit:
this was the script in the video TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Apex START_MODEL= eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000
import os
import random
import pathlib
import subprocess
training_text_file = 'langdata/heb.training_text'
lines = []
# Open the training text file with UTF-8 encoding
with open(training_text_file, 'r', encoding='utf-8') as input_file:
for line in input_file.readlines():
lines.append(line.strip())
output_directory = 'tesstrain/data/Rashi-ground-truth'
if not os.path.exists(output_directory):
os.mkdir(output_directory)
random.shuffle(lines)
count = 81
lines = lines[:count]
line_count = 0
for line in lines:
training_text_file_name = pathlib.Path(training_text_file).stem
line_training_text = os.path.join(output_directory, f'{training_text_file_name}_{line_count}.gt.txt')
with open(line_training_text, 'w', encoding='utf-8') as output_file:
output_file.writelines([line])
file_base_name = f'heb_{line_count}'
subprocess.run([
'text2image',
'--font=Mekorot-Rashi Medium', # Replace 'mer' with 'Mekorot-Rashi'
f'--text={line_training_text}',
f'--outputbase={output_directory}/{file_base_name}',
'--max_pages=1',
'--strip_unrenderable_words',
'--leading=32',
'--xsize=3600',
'--ysize=480',
'--char_spacing=1.0',
'--exposure=0',
'--unicharset_file=langdata/heb.unicharset'
])
line_count += 1
https://github.com/buliasz/tesstrain-windows-gui Use this thing after making the .tifs and stuff with the Python script from this video https://www.youtube.com/watch?v=KE4xEzFGSU8. That GUI is so nice but you will need to also install the AutoHotkey stuff to get the GUI to run, that man deserves a coffee.