I am trying to road a foaf file:
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
public class Testbed {
public static void main(String[] args) {
Model model = ModelFactory.createDefaultModel();
try {
model.read("http://www.csail.mit.edu/~lkagal/foaf", "RDF/XML");
}
catch(Exception ex) {
System.out.println(ex.toString());
}
}
}
I am getting the following exception:
org.apache.jena.riot.RiotException: [line: 1, col: 50] White spaces are required between publicId and systemId.
I do not understand what this exception means. How can I fix it. Am I using the wrong format (does not look like "TURTLE" or any other format)?
My environment (Windows 10 x64, apache-jena-3.1.1):
java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode
The URL http://www.csail.mit.edu/~lkagal/foaf
is actually redirecting to http://people.csail.mit.edu/lkagal/foaf
. The presence of a redirect is the cause of the error.
The problem was already reported and fixed in the development branch of Jena
(bug [JENA-1263]).
Apache Jena uses Apache HttpClient for connection handling. In particular, Jena 3.1.0
uses HttpClient 4.2.6
which was updated to HttpClient 4.5.2
in Jena 3.1.1
.
As @potame pointed out, the issue is not present using Jena 3.1.0
, the reason is that it creates a connection which by default supports various features, including automatically following redirects (it uses new SystemDefaultHttpClient()
).
On the contrary, with the update of HttpClient
, in Jena 3.1.1
the code was modified to create a more minimal type of connection that is unable to follow redirects (it uses HttpClients.createMinimal()
).
What happens is that, instead of reaching your foaf
file, it just retrieves the redirect message which is:
name="[xml]",ch=DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://people.csail.mit.edu/lkagal/foaf">here</a>.</p>
<hr>
<address>Apache/2.2.16 (Debian) Server at www.csail.mit.edu Port 80</address>
</body></html>
and then tries to parse it with Apache Xerces which is actually the one that throws the exception (you can see that by using ex.printStackTrace()
instead of System.out.println(ex.toString())
):
...
at org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:282)
at org.apache.xerces.impl.XMLScanner.reportFatalError(XMLScanner.java:1467)
at org.apache.xerces.impl.XMLScanner.scanExternalID(XMLScanner.java:1001)
...
http://people.csail.mit.edu/lkagal/foaf
Jena
Jena
provide Jena
with your own "redirect capable" connection, to be used instead of the default one; you can do so calling the method HttpOp.setDefaultHttpClient
prior to use model.read
, for example:
HttpOp.setDefaultHttpClient(HttpClientBuilder.create().build());