I've been trying to create a very simple web app that reads an epub, and show each chapter in sections. Very simple. I tried to use both epubjs and epub-parser, but each time, and I've tried with several .epub files, it returns nothing (errors, empty, etc.). I tried verificators to make sure my epubs were well formed etc., all good. When I open all the epubs I have withg WinZip, it looks at they should.
I'm desperate as I don't understand what's going wrong and why I'm not able to simply read an epub.
Below are the two functions I use for this. The error starts at item.load() that doesn't return the appropriate values/object. Right now the error I have is:
Error extracting chapter content: TypeError: rawContent?.trim is not a function
so clearly rawContent isn't in the right format, but I couldn't figure out why and how to fix it.
I also attached the logs for book, spine, and metadata, though I couldn't identify anything wrong.
Any help or suggestions would be extermly appreciated. Thanks!
export class EpubService {
private static async parseEpubContent(arrayBuffer: ArrayBuffer): Promise<{
chapters: Chapter[];
title: string;
}> {
try {
const book = ePub(arrayBuffer);
await book.ready;
const spine = await book.loaded.spine;
const metadata = await book.loaded.metadata;
logger.info('book: ', book);
logger.info('spine: ', spine);
logger.info('metadata: ', metadata);
if (!spine || spine.length === 0) {
throw new ProcessingError('No chapters found in EPUB');
}
logger.info('Processing EPUB with spine length:', spine.length);
const chapters: Chapter[] = [];
const maxChapters = Math.min(spine.length, 5);
for (let i = 0; i < maxChapters; i++) {
const item = spine.get(i);
if (!item) {
logger.warn(`No spine item found at index ${i}`);
continue;
}
try {
logger.info(`Processing chapter ${i + 1}/${maxChapters}`);
const content = await item.load();
if (!content) {
logger.warn(`No content loaded for chapter ${i + 1}`);
continue;
}
const { title, content: extractedContent } =
extractChapterContent(content);
chapters.push({
id: item.idref || String(i + 1),
title: title || `Chapter ${i + 1}`,
content: extractedContent,
summary: '',
status: 'pending',
});
logger.info(`Successfully processed chapter ${i + 1}`);
} catch (error) {
logger.error(`Error processing chapter ${i}:`, error);
// Continue with next chapter
continue;
}
}
if (chapters.length === 0) {
throw new ProcessingError('No valid chapters found in EPUB');
}
logger.info(`Successfully extracted ${chapters.length} chapters`);
return {
chapters,
title: metadata?.title || 'Untitled Book',
};
} catch (error) {
logger.error('Error parsing EPUB content:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to parse EPUB content');
}
}
static async processEpubFile(
file: File,
signal?: AbortSignal
): Promise<{ book: Book; cleanup: () => Promise<void> }> {
let filePath: string | undefined;
try {
logger.info('Starting EPUB file processing');
// Upload to Supabase
filePath = await StorageService.uploadFile(file);
logger.info('File uploaded to Supabase');
// Download for processing
const arrayBuffer = await StorageService.downloadFile(filePath);
logger.info('File downloaded from Supabase');
// Parse EPUB content
const { chapters, title } = await this.parseEpubContent(arrayBuffer);
const cleanup = async () => {
if (filePath) {
await StorageService.deleteFile(filePath).catch((error) => {
logger.error('Error cleaning up file:', error);
});
}
};
return {
book: { title, chapters },
cleanup,
};
} catch (error) {
// Clean up on error
if (filePath) {
await StorageService.deleteFile(filePath).catch((error) => {
logger.error('Error cleaning up file:', error);
});
}
logger.error('Error processing EPUB:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to process EPUB file');
}
}
}
export const extractChapterContent = (rawContent: string): ExtractedContent => {
try {
if (!rawContent?.trim()) {
throw new ProcessingError('Empty raw content provided');
}
const parser = new DOMParser();
const doc = parser.parseFromString(rawContent, 'text/html');
// Check for parsing errors
const parserError = doc.querySelector('parsererror');
if (parserError) {
throw new ProcessingError('Failed to parse HTML content');
}
const title = findTitle(doc);
const content = findContent(doc);
const cleanedContent = cleanContent(content);
validateContent(cleanedContent);
return {
title: title || 'Untitled Chapter',
content: cleanedContent,
};
} catch (error) {
logger.error('Error extracting chapter content:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to extract chapter content');
}
};
In the end, it looks like calling load() on book works when you specify the chapter in it, as follows: const content = await book.load(item.href);
Hope that helps someone.