pythonepubebooklib

Python. How to convert MOBI file to a text (or EPUB file)


I have problems with converting MOBI file to a text in Python.

I found this library - https://github.com/iscc/mobi that should convert MOBI to EPUB and then I found ebooklib library that is working very well with converting EPUB files to text.

The thing is that only ebooklib seems to be working properly. If I give it native EPUB file everything is working correctly. But If I try to pass to it filepath from mobi library then I receive bunch of errors that doesn't make much sense.

And I don't know what exactly is causing this. Maybe my MOBI files are encrypted somehow? (they are original books from Humble Bundle that I bought several months ago). But mobi library is not throwing any error about this.

Or maybe I cannot just pass filepath generated by mobi library as it is? Maybe I should somehow save this file, move it to some other folder and only then it will be "readable" by ebooklib?

My code looks like this:

import mobi

import ebooklib
from ebooklib import epub

tempdir, filepath = mobi.extract("book.mobi")

# This throws error:
book = epub.read_epub(filepath)

# Native, normal epub file is working ok:
book = epub.read_epub("book.epub")

Error isn't telling much in my opinion:

Traceback (most recent call last):
  File "/ebooklib/utils.py", line 35, in parse_string
tree = etree.parse(io.BytesIO(s.encode('utf-8')))
AttributeError: 'bytes' object has no attribute 'encode'

Solution

  • You can save it as html file

    pip install mobi
    

    than

    import mobi
    filepath="./example.mobi"
    folder="./"
    
    !mobiunpack -r   filepath folder
    

    List of all options available here

    Or here I propose another method:

    pip install mobi
    pip install html2text
    
    import mobi
    import html2text
    
    filename="test.mobi"
    tempdir, filepath = mobi.extract(filename)
    file = open(filepath, "r")
    content=file.read()
    print(html2text.html2text(content))