javafilesvgbatik

How to load and parse SVG documents


Background

There are number of unanswered questions related to reading and parsing SVG paths:

Problem

The SVG path element contains a data attribute (d). Sometimes it is necessary to load, parse, and extract just the path information from an SVG file.

Question

How do you load, parse, and extract SVG path information from an SVG file?


Solution

  • Overview

    Load and parse SVG files using Apache Batik (or EchoSVG). The solution shows Java code in the preliminary stages of converting an SVG file to MetaPost. This should provide a general idea for how to load, parse, and extract content from SVG files using Java.

    Libraries

    You will need the following libraries:

    batik-anim.jar
    batik-awt-util.jar
    batik-bridge.jar
    batik-css.jar
    batik-dom.jar
    batik-ext.jar
    batik-gvt.jar
    batik-parser.jar
    batik-script.jar
    batik-svg-dom.jar
    batik-svggen.jar
    batik-util.jar
    batik-xml.jar
    xml-apis-ext.jar
    

    Load SVG File

    The main application loads the SVG file into DOM, then converts the DOM to an SVG DOM. The initSVGDOM() method call is extremely important. Without calling initSVGDOM(), the methods for extracting SVG DOM elements from the DOM would not be available.

    import java.io.File;
    import java.io.IOException;
    
    import java.net.URI;
    
    import org.apache.batik.bridge.BridgeContext;
    import org.apache.batik.bridge.DocumentLoader;
    import org.apache.batik.bridge.GVTBuilder;
    import org.apache.batik.bridge.UserAgent;
    import org.apache.batik.bridge.UserAgentAdapter;
    import org.apache.batik.dom.svg.SAXSVGDocumentFactory;
    import org.apache.batik.dom.svg.SVGOMSVGElement;
    import org.apache.batik.util.XMLResourceDescriptor;
    
    import org.w3c.dom.Document;
    import org.w3c.dom.NodeList;
    
    
    /**
     * Responsible for converting all SVG path elements into MetaPost curves.
     */
    public class SVGMetaPost {
      private static final String PATH_ELEMENT_NAME = "path";
      
      private Document svgDocument;
      
      /**
       * Creates an SVG Document given a URI.
       *
       * @param uri Path to the file.
       * @throws Exception Something went wrong parsing the SVG file.
       */
      public SVGMetaPost( String uri ) throws IOException {
        setSVGDocument( createSVGDocument( uri ) );
      }
    
      /**
       * Finds all the path nodes and converts them to MetaPost code.
       */
      public void run() {
        NodeList pathNodes = getPathElements();
        int pathNodeCount = pathNodes.getLength();
    
        for( int iPathNode = 0; iPathNode < pathNodeCount; iPathNode++ ) {
          MetaPostPath mpp = new MetaPostPath( pathNodes.item( iPathNode ) );
          System.out.println( mpp.toCode() );
        }
      }
      
      /**
       * Returns a list of elements in the SVG document with names that
       * match PATH_ELEMENT_NAME.
       * 
       * @return The list of "path" elements in the SVG document.
       */
      private NodeList getPathElements() {
        return getSVGDocumentRoot().getElementsByTagName( PATH_ELEMENT_NAME );
      }
      
      /**
       * Returns an SVGOMSVGElement that is the document's root element.
       * 
       * @return The SVG document typecast into an SVGOMSVGElement.
       */
      private SVGOMSVGElement getSVGDocumentRoot() {
        return (SVGOMSVGElement)getSVGDocument().getDocumentElement();
      }
    
      /**
       * This will set the document to parse. This method also initializes
       * the SVG DOM enhancements, which are necessary to perform SVG and CSS
       * manipulations. The initialization is also required to extract information
       * from the SVG path elements.
       *
       * @param document The document that contains SVG content.
       */
      public void setSVGDocument( Document document ) {
        initSVGDOM( document );
        this.svgDocument = document;
      }
    
      /**
       * Returns the SVG document parsed upon instantiating this class.
       * 
       * @return A valid, parsed, non-null SVG document instance.
       */
      public Document getSVGDocument() {
        return this.svgDocument;
      }
      
      /**
       * Enhance the SVG DOM for the given document to provide CSS- and SVG-specific
       * DOM interfaces.
       * 
       * @param document The document to enhance.
       * @link http://wiki.apache.org/xmlgraphics-batik/BootSvgAndCssDom
       */
      private void initSVGDOM( Document document ) {
        UserAgent userAgent = new UserAgentAdapter();
        DocumentLoader loader = new DocumentLoader( userAgent );
        BridgeContext bridgeContext = new BridgeContext( userAgent, loader );
        bridgeContext.setDynamicState( BridgeContext.DYNAMIC );
    
        // Enable CSS- and SVG-specific enhancements.
        (new GVTBuilder()).build( bridgeContext, document );
      }
    
      /**
       * Use the SAXSVGDocumentFactory to parse the given URI into a DOM.
       * 
       * @param uri The path to the SVG file to read.
       * @return A Document instance that represents the SVG file.
       * @throws Exception The file could not be read.
       */
      private Document createSVGDocument( String uri ) throws IOException {
        String parser = XMLResourceDescriptor.getXMLParserClassName();
        SAXSVGDocumentFactory factory = new SAXSVGDocumentFactory( parser );
        return factory.createDocument( uri );
      }
    
      /**
       * Reads a file and parses the path elements.
       * 
       * @param args args[0] - Filename to parse.
       * @throws IOException Error reading the SVG file.
       */
      public static void main( String args[] ) throws IOException {
        URI uri = new File( args[0] ).toURI();
        SVGMetaPost converter = new SVGMetaPost( uri.toString() );
        converter.run();
      }
    }
    

    Note: Calling initSVGDOM() should be Batik's default behaviour unless otherwise specified. Alas, it isn't, and discovering this gem means reading documentation buried on their website.

    Parse SVG DOM

    Parsing the SVG DOM is then relatively trivial. The toCode() method is the workhorse of the class:

    import org.apache.batik.dom.svg.SVGItem;
    import org.apache.batik.dom.svg.SVGOMPathElement;
    
    import org.w3c.dom.Node;
    import org.w3c.dom.svg.SVGPathSegList;
    
    /**
     * Responsible for converting an SVG path element to MetaPost. This
     * will convert just the bezier curve portion of the path element, not
     * its style. Typically the SVG path data is provided from the "d" attribute
     * of an SVG path node.
     */
    public class MetaPostPath extends MetaPost {
      private SVGOMPathElement pathElement;
      
      /**
       * Use to create an instance of a class that can parse an SVG path
       * element to produce MetaPost code.
       *
       * @param pathNode The path node containing a "d" attribute (output as MetaPost code).
       */
      public MetaPostPath( Node pathNode ) {
        setPathNode( pathNode );
      }
      
      /**
       * Converts this object's SVG path to a MetaPost draw statement.
       * 
       * @return A string that represents the MetaPost code for a path element.
       */
      public String toCode() {
        StringBuilder sb = new StringBuilder( 16384 );
        SVGOMPathElement pathElement = getPathElement();
        SVGPathSegList pathList = pathElement.getNormalizedPathSegList();
        
        int pathObjects = pathList.getNumberOfItems();
    
        sb.append( ( new MetaPostComment( getId() ) ).toString() );
        
        for( int i = 0; i < pathObjects; i++ ) {
          SVGItem item = (SVGItem)pathList.getItem( i );
          sb.append( String.format( "%s%n", item.getValueAsString() ) );
        }
    
        return sb.toString();
      }
      
      /**
       * Returns the value for the id attribute of the path element. If the
       * id isn't present, this will probably throw a NullPointerException.
       * 
       * @return A non-null, but possibly empty String.
       */
      private String getId() {
        return getPathElement().getAttributes().getNamedItem( "id" ).getNodeValue();
      }
      
      /**
       * Typecasts the given pathNode to an SVGOMPathElement for later analysis.
       * 
       * @param pathNode The path element that contains curves, lines, and other
       * SVG instructions.
       */
      private void setPathNode( Node pathNode ) {
        this.pathElement = (SVGOMPathElement)pathNode;
      }
    
      /**
       * Returns an SVG document element that contains path instructions (usually
       * for drawing on a canvas).
       * 
       * @return An object that contains a list of items representing pen
       * movements.
       */
      private SVGOMPathElement getPathElement() {
        return this.pathElement;
      }
    }
    

    Build

    Compiling will vary from environment to environment. A script similar to the following should help:

    #!/bin/bash
    mkdir -p ./build
    javac -cp ./lib/* -d ./build ./source/*.java
    

    Be sure to put all the .jar files into the ./lib directory. Put the source files into the ./source directory.

    Run

    Create a script (or batch file) to execute the program:

    #!/bin/bash
    java -cp ./lib/*:./build SVGMetaPost $1
    

    Output

    When run against a file containing a valid SVG path, this produces:

    $ ./run.sh stripe/trigon.svg 
    % path8078-6
    M 864.1712 779.3069
    C 864.1712 779.3069 868.04065 815.6211 871.4032 833.4621
    C 873.4048 844.08203 874.91724 855.0544 879.0846 864.82227
    C 884.24023 876.9065 895.2377 887.9899 900.0184 897.3661
    C 904.7991 906.7422 907.3466 918.3257 907.3466 918.3257
    C 907.3466 918.3257 892.80817 887.6536 864.1712 887.3086
    C 835.53424 886.9637 820.9958 918.3257 820.9958 918.3257
    C 820.9958 918.3257 823.6176 906.59644 828.32404 897.3661
    C 833.0304 888.1356 844.10223 876.9065 849.2578 864.82227
    C 853.4252 855.05444 854.9376 844.08203 856.93915 833.4621
    C 860.3017 815.6211 864.17114 779.3069 864.17114 779.3069
    z
    

    From here it should be clear how to read SVG path data into their corresponding SVG objects.

    Addendum

    Note that the simplest way to convert from SVG to MetaPost is:

    1. Convert SVG to PDF (e.g., using Inkscape or rsvg-convert).
    2. Convert PDF to MetaPost using pstoedit.