I have problems with converting MOBI file to a text in Python.
I found this library - https://github.com/iscc/mobi that should convert MOBI to EPUB and then I found ebooklib library that is working very well with converting EPUB files to text.
The thing is that only ebooklib seems to be working properly. If I give it native EPUB file everything is working correctly. But If I try to pass to it filepath from mobi library then I receive bunch of errors that doesn't make much sense.
And I don't know what exactly is causing this. Maybe my MOBI files are encrypted somehow? (they are original books from Humble Bundle that I bought several months ago). But mobi library is not throwing any error about this.
Or maybe I cannot just pass filepath generated by mobi library as it is? Maybe I should somehow save this file, move it to some other folder and only then it will be "readable" by ebooklib?
My code looks like this:
import mobi
import ebooklib
from ebooklib import epub
tempdir, filepath = mobi.extract("book.mobi")
# This throws error:
book = epub.read_epub(filepath)
# Native, normal epub file is working ok:
book = epub.read_epub("book.epub")
Error isn't telling much in my opinion:
Traceback (most recent call last):
File "/ebooklib/utils.py", line 35, in parse_string
tree = etree.parse(io.BytesIO(s.encode('utf-8')))
AttributeError: 'bytes' object has no attribute 'encode'
You can save it as html file
pip install mobi
than
import mobi
filepath="./example.mobi"
folder="./"
!mobiunpack -r filepath folder
List of all options available here
Or here I propose another method:
pip install mobi
pip install html2text
import mobi
import html2text
filename="test.mobi"
tempdir, filepath = mobi.extract(filename)
file = open(filepath, "r")
content=file.read()
print(html2text.html2text(content))