I am trying to create a Kindle dictionary that can be used for offline lookup. I already have the words and their inflections, but turning this into a working dictionary is difficult.
There is some documentation about this provided by Amazon. It basically says that you should:
So I created a large XHTML file (23 MB or so) according to the Amazon specifications and opened it in Kindle Previewer, and it looked fine. However, Kindle Previewer does not let you export XHTML files to MOBI. They want you to create an intermediate epub file.
I tried using Pandoc to do the conversion, which did not work because it stripped out all the specific HTML tags and only left in paragraphs. Then I tried using calibre. The normal XHTML -> epub conversion failed because the XHTML file was too large, according to an error message. Calibre suggests to turn on the "heuristic mode" if you run into this error, which I tried, but which did not finish running after hours of runtime.
Then I attempted to create the epub file myself, using a sample file taken from this tutorial. I discovered that this is not trivial, and a check using epubcheck revealed many hard-to-understand errors in my generated file. The generation of the epub file is also a bit complicated by the fact that you probably need to split the XHTML files into many smaller files, which should maybe be 250 kb in size, because e-readers tend to struggle with parsing larger files.
So I thought there should maybe be an easier way to do this, or maybe a library that helps doing this. Maybe it would even be a good idea to output the words + inflections into some other easier dictionary format and then convert it to a MOBI using an existing library and leaving out the XHTML generation completely. Currently I am using Python, but I'd also use other languages if it is necessary. What could I try?
Edit: To add to the things I have tried: there is an apparently closed source script here that unfortunately doesn't support inflections, so does not work. And there are instructions here that advise converting the file to PRC using Mobipocket Creator and then opening it with Kindle Previewer. The problem with this approach is that Kindle Previewer throws the error:
Kindle Previewer does not support this file, which has either been created using an older version of KindleGen or a third party application. We recommend using EPUB or DOCX format directly for previewing and publishing your book on Kindle.
There are also more detailed instructions for Mobipocket Creator here, which tell you to directly move the generated .prc file onto the kindle. I tried that but it is not being recognized as a dictionary.
I figured it out by myself. First I implemented a solution myself, then I found the pyglossary library (right now the code below only works with the version from Github and not from pip) and used it like this:
from pyglossary.glossary import Glossary
Glossary.init()
glos = Glossary()
defiFormat = "h"
base_forms = get_base_forms()
for canonical_form in base_forms:
inflections = get_inflections(canonical_form)
definitions = get_definition(canonical_form)
definitionhtml = ""
for definition in definitions:
definitionhtml += "<p>" + gloss + "</p>"
all_forms = [canonical_form]
all_forms.extend(inflections)
glos.addEntryObj(glos.newEntry(all_forms, glosshtml, defiFormat))
glos.setInfo("title", "Russian-English Dictionary")
glos.setInfo("author", "Vuizur")
glos.sourceLangName = "Russian"
glos.targetLangName = "English"
glos.write("test.mobi", format="Mobi", keep=True, kindlegen_path="path/to/kindlegen.exe")