c++visual-studiovtd-xml

How can I get rid of the EOFException when using vtd-xml in c++?


I am writing a program to process an old dataset in c++. I've already managed to convert the files from sgml to xml using the sx tool from James Clark. Since I have past experience using vtd-xml with Matlab (which is java based), and since vtd-xml has a c++ port, I decided to use that for my project. I am using vtd-xml version 2.12 since that was the newest version of the c++ port I could find. I managed to compile it using Visual Studio 2019 by changing all calls of wcsdup to _wcsdup and by using the _CRT_SECURE_NO_WARNINGS preprocessor definition. My program below appears to give correct output, but it also throws an exception during parsing of the xml file (a test xml file is also below). The exception is an EOFException. I don't see anything obviously wrong with my xml files, and the error is reproduced with the test xml below that is not one I converted from sgml. My intuition is that if there were a bug in the c++ port it would be easier to find information about it when Googling for vtd-xml EOFException. So, it seems to me that the changes I made to get it to compile are likely the culprit, but I can't figure out how to get rid of the exception. Any ideas would be welcome. If it comes to it, I am willing to use a different xml library for my program if it is free.

My code:

#include <iostream>
#include <fstream>
#include "VTDGen.h"
#include "autoPilot.h"
#include "customTypes.h"

using namespace std;
using namespace com_ximpleware;

int main() {
    ifstream xml(".\\cd_catalog_short.xml", ios::binary | ios::ate);
    ifstream::pos_type pos = xml.tellg();
    long int length = static_cast<long int>(pos);
    char* pChars = new char[length];
    xml.seekg(0, ios::beg);
    xml.read(pChars, pos);
    xml.close();

    UCSChar node_path[] = L"/CATALOG/CD/TITLE";
    UCSChar* title;
    VTDGen vg;
    vg.setDoc(pChars, length);
    vg.parse(false);
    AutoPilot ap;
    ap.selectXPath(node_path);
    VTDNav* vn = vg.getNav();
    ap.bind(vn);
    while (ap.evalXPath() != -1) {
        int ind = vn->getText();
        if (ind != -1) {
            title = vn->toNormalizedString(ind);
            wcout << title << endl;
            delete[] title;
        }
    }
    return 0;
}

A test xml file:

<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>For the good times</TITLE>
    <ARTIST>Kenny Rogers</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Mucik Master</COMPANY>
    <PRICE>8.70</PRICE>
    <YEAR>1995</YEAR>
  </CD>
  <CD>
    <TITLE>Big Willie style</TITLE>
    <ARTIST>Will Smith</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1997</YEAR>
  </CD>
  <CD>
    <TITLE>Tupelo Honey</TITLE>
    <ARTIST>Van Morrison</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Polydor</COMPANY>
    <PRICE>8.20</PRICE>
    <YEAR>1971</YEAR>
  </CD>
</CATALOG>

My program output:

Exception thrown at 0x00007FF96A36A839 in em.exe: Microsoft C++ exception: com_ximpleware::EOFException at memory location 0x0000005498B6F350.

For the good times

Big Willie style

Tupelo Honey

C:\Users\Joe\source\repos\em\x64\Release\em.exe (process 16308) exited with code 0.

To automatically close the console when debugging stops, enable Tools->Options->Debugging-> Automatically close the console when debugging stops.

Press any key to close this window . . .


Solution

  • vtd-xml seems to use EOFException more like a signal than a bonafide error state. I eliminated the possibility that the error comes from the changes made to get it to compile in Visual Studio (C++) by running a java version of the program. This uses the latest java version of vtd-xml (2.13-4-java), and it still catches the EOFException. If I had been running the c++ program through the console instead of the Visual Studio IDE, I likely would never have known about the exception.

    Here is the java code:

    /* 
     * Copyright (C) 2002-2011 XimpleWare, info@ximpleware.com
     */
    import com.ximpleware.*;
    import com.ximpleware.xpath.*;
    import java.io.*;
    
    public class Tester {
    
      public static void main(String argv[]){
    
    
        VTDGen vg = new VTDGen();
    
            if (vg.parseFile("./cd_catalog_short.xml",false)){
            try {
                VTDNav vn = vg.getNav();
                AutoPilot ap = new AutoPilot(vn);
                        ap.selectXPath("/CATALOG/CD/TITLE");
                        int result = -1;
                int count = 0;
                while((result = ap.evalXPath())!=-1){
                System.out.print(""+result+"  ");     
                System.out.print("Element name ==> "+vn.toString(result));
                int t = vn.getText(); // get the index of the text (char data or CDATA)
                if (t!=-1)
                  System.out.println(" Text  ==> "+vn.toNormalizedString(t));
                System.out.println("\n ============================== ");
                count++;
                }
                System.out.println("Total # of element "+count);
            }
                catch (NavException e){
                 System.out.println(" Exception during navigation "+e);
                }
                catch (XPathParseException e){
                 System.out.println(" Exception during parse "+e);
                }
                catch (XPathEvalException e){
                 System.out.println(" Exception during xpath evaluation "+e);
                }
            }
      }
    }
    

    And here is the program output in jdb:

    jdb -classpath .;ximpleware-2.13-4-java Tester

    Initializing jdb ...

    catch com.ximpleware.EOFException

    Deferring all com.ximpleware.EOFException. It will be set after the class is loaded.

    run

    run Tester

    Set uncaught java.lang.Throwable Set deferred uncaught java.lang.Throwable

    VM Started: Set deferred all com.ximpleware.EOFException

    Exception occurred: com.ximpleware.EOFException (to be caught at: com.ximpleware.VTDGen.parse(), line=2,663 bci=1,597)"thread=main", com.ximpleware.VTDGen$UTF8Reader.getChar(), line=774 bci=24 774 throw e;

    main[1] cont

    7 Element name ==> TITLE Text ==> For the good times

    ==============================

    20 Element name ==> TITLE Text ==> Big Willie style

    ==============================

    33 Element name ==> TITLE Text ==> Tupelo Honey

    ==============================

    Total # of element 3

    The application exited