scalaapache-poipowerpointmicrosoft-fabric

How to Create a PowerPoint Presentation from a Template using Apache POI on Microsoft Fabric Spark in Scala?


I want to read a PowerPoint POTX file using Apache POI and populate the template as part of a notebook run and write the resulting PPTX file to Azure Blob Storage. This is performed on Spark in Microsoft's Fabric product. Fabric allows uploading Java dependencies to an Environment and you can create multiple Environments. You can choose which Environment to use when running the notebook.

I've created an Environment for POI 5.2.3. The libraries I've uploaded are

5.2.3
commons-codec-1.15.jar
commons-collections4-4.4.jar
commons-compress-1.21.jar
commons-io-2.11.0.jar
commons-math3-3.6.1.jar
curvesapi-1.07.jar
log4j-api-2.18.0.jar
poi-5.2.3.jar
poi-ooxml-5.2.3.jar
poi-ooxml-lite-5.2.3.jar
SparseBitSet-1.2.jar
xmlbeans-5.1.1.jar

Using 5.2.3 the notebook runs without errors but the output is not a valid PPTX file.

PowerPoint found a problem with content in slides.pptx.
PowerPoint can attempt to repair the presentation. If you trust the source of this presentation, click Repair.

Rapairing the file does not work. If I change the file extension to .ppt PowerPoint loads the file ok.

I created a notebook taking the Create a new slide from a predefined slide layout snippet from the XLSF Cookbook on the Apache POI website, adapted it to Scala and added the bit at the end where the presentation is saved to file.

import org.apache.poi.xslf.usermodel.{XMLSlideShow, XSLFSlide, SlideLayout}
import java.io.{FileInputStream,FileOutputStream}
import scala.jdk.CollectionConverters._

var ppt = new XMLSlideShow(new FileInputStream("/lakehouse/default/Files/template.potx"));

// first see what slide layouts are available :
println("Available slide layouts:");
for(master <- ppt.getSlideMasters().asScala){
    for(layout <- master.getSlideLayouts()){
        println(layout.getType());
    }
}

// blank slide
var blankSlide = ppt.createSlide();

// there can be multiple masters each referencing a number of layouts
// for demonstration purposes we use the first (default) slide master
var defaultMaster = ppt.getSlideMasters().get(0);

// title slide
var titleLayout = defaultMaster.getLayout(SlideLayout.CUST);
// fill the placeholders
var slide1 = ppt.createSlide(titleLayout);
var title1 = slide1.getPlaceholder(0);
title1.setText("First Title");

// title and content
var titleBodyLayout = defaultMaster.getLayout(SlideLayout.CUST);
var slide2 = ppt.createSlide(titleBodyLayout);

var title2 = slide2.getPlaceholder(0);
title2.setText("Second Title");

var body2 = slide2.getPlaceholder(1);
body2.clearText(); // unset any existing text
body2.addNewTextParagraph().addNewTextRun().setText("First paragraph");
body2.addNewTextParagraph().addNewTextRun().setText("Second paragraph");
body2.addNewTextParagraph().addNewTextRun().setText("Third paragraph");

var outStream = new FileOutputStream("/lakehouse/default/Files/Output/slides.pptx")
ppt.write(outStream)
outStream.close()

I was hoping that the output file would be in the XML pptx format. Is there something else I need to do to ensure that this is the case?


Solution

  • The code of Create a new slide from a predefined slide layout will work if the source slideshow.pptx gets saved as *.pptx after creating the new slides from predefined slide layouts. It cannot work using *.potx as source and saving as *.pptx without changing the content type.

    In your case starting with template.potx and writing slides.pptx cannot work without changing the content type. If you would start with template.pptx and write slides.pptx, then it will work. Same if you would start with template.potx and write slides.potx,

    A PowerPoint template file *.potx and a PowerPoint slideshow file *.pptx have different content types. Those are stored within the files. The *.potxfile is of content txpe application/vnd.openxmlformats-officedocument.presentationml.template.main+xml. The *.pptx file is of content type application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml. A content type mismatch results in an error while opening the file using PowerPoint.

    If one needs to change the content type, then this could look like so:

    ...
    XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("./template.potx"));
    ...
    ppt.getPackage().replaceContentType(
       "application/vnd.openxmlformats-officedocument.presentationml.template.main+xml",
       "application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml");
    
    FileOutputStream out = new FileOutputStream("./slides.pptx");
    ppt.write(out);
    out.close();
    ppt.close();
    ...