web-applicationsepubepub.js

Read epub with epubjs or epub-parser always returns empty


I've been trying to create a very simple web app that reads an epub, and show each chapter in sections. Very simple. I tried to use both epubjs and epub-parser, but each time, and I've tried with several .epub files, it returns nothing (errors, empty, etc.). I tried verificators to make sure my epubs were well formed etc., all good. When I open all the epubs I have withg WinZip, it looks at they should.

I'm desperate as I don't understand what's going wrong and why I'm not able to simply read an epub.

Below are the two functions I use for this. The error starts at item.load() that doesn't return the appropriate values/object. Right now the error I have is:

Error extracting chapter content: TypeError: rawContent?.trim is not a function

so clearly rawContent isn't in the right format, but I couldn't figure out why and how to fix it.

I also attached the logs for book, spine, and metadata, though I couldn't identify anything wrong.

Any help or suggestions would be extermly appreciated. Thanks!

Book & Spine & Metadata

export class EpubService {
  private static async parseEpubContent(arrayBuffer: ArrayBuffer): Promise<{
    chapters: Chapter[];
    title: string;
  }> {
    try {
      const book = ePub(arrayBuffer);
      await book.ready;

      const spine = await book.loaded.spine;
      const metadata = await book.loaded.metadata;

      logger.info('book: ', book);
      logger.info('spine: ', spine);
      logger.info('metadata: ', metadata);

      if (!spine || spine.length === 0) {
        throw new ProcessingError('No chapters found in EPUB');
      }

      logger.info('Processing EPUB with spine length:', spine.length);

      const chapters: Chapter[] = [];
      const maxChapters = Math.min(spine.length, 5);

      for (let i = 0; i < maxChapters; i++) {
        const item = spine.get(i);
        if (!item) {
          logger.warn(`No spine item found at index ${i}`);
          continue;
        }

        try {
          logger.info(`Processing chapter ${i + 1}/${maxChapters}`);

          const content = await item.load();
          if (!content) {
            logger.warn(`No content loaded for chapter ${i + 1}`);
            continue;
          }

          const { title, content: extractedContent } =
            extractChapterContent(content);

          chapters.push({
            id: item.idref || String(i + 1),
            title: title || `Chapter ${i + 1}`,
            content: extractedContent,
            summary: '',
            status: 'pending',
          });

          logger.info(`Successfully processed chapter ${i + 1}`);
        } catch (error) {
          logger.error(`Error processing chapter ${i}:`, error);
          // Continue with next chapter
          continue;
        }
      }

      if (chapters.length === 0) {
        throw new ProcessingError('No valid chapters found in EPUB');
      }

      logger.info(`Successfully extracted ${chapters.length} chapters`);

      return {
        chapters,
        title: metadata?.title || 'Untitled Book',
      };
    } catch (error) {
      logger.error('Error parsing EPUB content:', error);
      throw error instanceof ProcessingError
        ? error
        : new ProcessingError('Failed to parse EPUB content');
    }
  }

  static async processEpubFile(
    file: File,
    signal?: AbortSignal
  ): Promise<{ book: Book; cleanup: () => Promise<void> }> {
    let filePath: string | undefined;

    try {
      logger.info('Starting EPUB file processing');

      // Upload to Supabase
      filePath = await StorageService.uploadFile(file);
      logger.info('File uploaded to Supabase');

      // Download for processing
      const arrayBuffer = await StorageService.downloadFile(filePath);
      logger.info('File downloaded from Supabase');

      // Parse EPUB content
      const { chapters, title } = await this.parseEpubContent(arrayBuffer);

      const cleanup = async () => {
        if (filePath) {
          await StorageService.deleteFile(filePath).catch((error) => {
            logger.error('Error cleaning up file:', error);
          });
        }
      };

      return {
        book: { title, chapters },
        cleanup,
      };
    } catch (error) {
      // Clean up on error
      if (filePath) {
        await StorageService.deleteFile(filePath).catch((error) => {
          logger.error('Error cleaning up file:', error);
        });
      }

      logger.error('Error processing EPUB:', error);
      throw error instanceof ProcessingError
        ? error
        : new ProcessingError('Failed to process EPUB file');
    }
  }
}
export const extractChapterContent = (rawContent: string): ExtractedContent => {
  try {
    if (!rawContent?.trim()) {
      throw new ProcessingError('Empty raw content provided');
    }

    const parser = new DOMParser();
    const doc = parser.parseFromString(rawContent, 'text/html');

    // Check for parsing errors
    const parserError = doc.querySelector('parsererror');
    if (parserError) {
      throw new ProcessingError('Failed to parse HTML content');
    }

    const title = findTitle(doc);
    const content = findContent(doc);
    const cleanedContent = cleanContent(content);

    validateContent(cleanedContent);

    return {
      title: title || 'Untitled Chapter',
      content: cleanedContent,
    };
  } catch (error) {
    logger.error('Error extracting chapter content:', error);
    throw error instanceof ProcessingError
      ? error
      : new ProcessingError('Failed to extract chapter content');
  }
};

Solution

  • In the end, it looks like calling load() on book works when you specify the chapter in it, as follows: const content = await book.load(item.href);

    Hope that helps someone.