javasaxonjaxp

JAXP saxon-he : XMLfile StreamSource doesn't release file access after parsing error


I'm using JAXP specification API combined with Saxon-HE API, the main purpose was to develop an application which transform XML files using configurable XSLT stylesheets, able to override generated output documents.. I skip details because I created an example project to illustrate the encountered issue:

Use case: in case of transformation errors, moving xml file to another directory (could be error directory) raises access exception.

When I instantiate the StreamSource based on a File instance (pointing to the XML file), if certain parsing error occurs, moving the file raises "The process cannot access the file because it is being used by another process." Exception.

Here is a main-single-class app I wrote to illustrate the issue :

package com.sample.xslt.application;

import net.sf.saxon.Configuration;
import net.sf.saxon.lib.FeatureKeys;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;

import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;

public class XsltApplicationSample {

  public static void main(String[] args) throws Exception {

    if (args.length != 2) {
      throw new RuntimeException("Two arguments are expected : <xslFilePath> <inputFilePath>");
    }
    String xslFilePath = args[0];
    String xmlFilePath = args[1];

    TransformerFactory factory = TransformerFactory.newInstance();
    factory.setAttribute(FeatureKeys.ALLOW_MULTITHREADING, Boolean.TRUE);
    factory.setAttribute(FeatureKeys.RECOVERY_POLICY,
        new Integer(Configuration.RECOVER_WITH_WARNINGS));

    Source xslSource = new StreamSource(new File(xslFilePath));
    Source xmlSource = new StreamSource(new File(xmlFilePath));
    Transformer transformer = factory.newTransformer(xslSource);

    try {
      transformer.transform(xmlSource, new DOMResult());

    } catch (TransformerException e) {
      System.out.println(e.getMessage());
    }

    // move input file to tmp directory (for example, could be configured error dir)

    File srcFile = Paths.get(xmlFilePath).toFile();
    File tempDir = new File(System.getProperty("java.io.tmpdir"));

    Path destFilePath = new File(tempDir, srcFile.getName()).toPath();

    try {
      Files.move(srcFile.toPath(), destFilePath, StandardCopyOption.REPLACE_EXISTING);
    } catch (SecurityException | IOException e) {
      System.out.println(e.getMessage());
    }
  }
}

Teh configured xslt transformation file content must be valid to reproduce. If the input xml file is empty, it will create a transformation/parsing error, but the access file error won't occur.

Example of input file to reproduce :

<root>
    <elem>
</root>

Example of STDOUT:

JAXP: find factoryId =javax.xml.transform.TransformerFactory
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.DocumentBuilderFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
Error on line 3 column 3 of input_err.xml:
  SXXP0003: Error reported by XML parser: The element type "elem" must be terminated by the
  matching end-tag "</elem>".
org.xml.sax.SAXParseException; systemId: file:/C:/<path>/input_err.xml; lineNumber: 3; columnNumber: 3; The element type "elem" must be terminated by the matching end-tag "</elem>".
C:\<path>\input_err.xml -> C:\<path>\AppData\Local\Temp\input_err.xml: The process cannot access the file because it is being used by another process.

used command line (I use Eclipse) :

java ... -Djaxp.debug=1 -Dfile.encoding=UTF-8 -classpath <...> com.sample.xslt.application.XsltApplicationSample C:\<path>\transform.xsl C:\<path>\input_err.xml

used pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.sample</groupId>
    <artifactId>XsltExampleProject</artifactId>
    <version>1.0.0-SNAPSHOT</version>

    <name>XsltExampleProject</name>
    <description>XSLT example project</description>

    <dependencies>
        <dependency>
            <groupId>net.sf.saxon</groupId>
            <artifactId>Saxon-HE</artifactId>
            <version>9.7.0-7</version>
        </dependency>

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.2.1</version>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src</sourceDirectory>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

The workaround I used is to load xml input file's content in memory as String, see following :

String xmlContent = FileUtils.readFileToString(new File(xmlFilePath), StandardCharsets.UTF_8);

Source xslSource = new StreamSource(new File(xslFilePath));
Source xmlSource = new StreamSource(new StringReader(xmlContent));

Do I miss something while initializing the Transformer? Default resolved SAX Parser should be overrided to another API recommended by Saxon? I think that Xerces parser is used according to debug logging, but is it fully compatible with the transformer implementation provided by Saxon? I'm a bit confused on this one..

Thanks for your help !


Solution

  • From the comment thread following the question, it appears to be a bug/defect in the XML parser supplied with the JDK. Your options are:

    (a) report the bug and wait very patiently for it to be fixed

    (b) use the Apache Xerces parser instead

    (c) instead of supplying a File, supply a FileInputStream, and close it yourself.

    My recommendation would be (b), since the Apache Xerces parser is much more reliable than the version in the JDK.