parsingstructureextractdata-extractionopendocument

Extracting structural data from ODP or ODF files


I'm trying to extract the information hierarchy within ODP (OpenDocument Presentation) files : Titles, subtitles, body text...

Do you know any tool or technique that would do the job?

Else, is there a mean to parse those ODP documents in order to extract styling informations? So I can later deduce the document structure from its styling.

I'm afraid the structure of the XML file inside the ODP file could depend on softwares or versions. So that, I'd rather find a high level solution than parsing directly this XML file.


Solution

  • As I couldn't find any tool that would enable to extract outline, titles, text... from presentation files, I created Exide, an open source API supporting ODP, PPTX and beamer files, it enables:

    For more information, check out the github page of the project.