javaxmlrssfeedrome

RSS Feed - While parsing closing tag exception occurs


I am using rome-1.5.jar for parsing RSS Feed. But when it parse some rss feed it gives an error of closing meta tag.

RSS Feed link : NewYork Times RSS Feed Link

here is the code

 public static SyndFeed getRssFeed(String rsslUrl){
      try {
          URL url = new URL(rsslUrl);
          HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
          httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
          SyndFeedInput input = new SyndFeedInput();
          return input.build(new XmlReader(httpcon.getInputStream()));
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
  }

Here is the exception

com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 45: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:215)
    at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:133)
    at com.gold.eloop.server.util.RssUtil.getRssFeed(RssUtil.java:132)
    at com.gold.eloop.server.util.RssUtil.getRssForProfile(RssUtil.java:228)
    at com.gold.eloop.server.util.RssUtil.mergeRssProfiles(RssUtil.java:269)
    at com.gold.eloop.server.util.outbound.MailMerger.getTransmission(MailMerger.java:581)
    at com.gold.eloop.server.services.MessageServiceImpl.sendTestMessage(MessageServiceImpl.java:192)
    at com.gold.eloop.server.remoteservices.MessageServiceRemote.sendTestMessage(MessageServiceRemote.java:309)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:562)
    at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(RemoteServiceServlet.java:188)
    at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:224)
    at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:324)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
Caused by: org.jdom2.input.JDOMParseException: Error on line 45: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
    at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
    at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:212)
    ... 34 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 45; columnNumber: 9; The element type "meta" must be terminated by the matching end-tag "</meta>".
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217)
    ... 37 more

Is any I did wrong in this code. Please help me out to resolve this error.


Solution

  • The specified URL http://www.nytimes.com/services/xml/rss/index.html does not return an RSS document.

    There is content like:

    <meta name="PT" content="Member Center">
    <meta name="PST" content="RSS Page">
    

    which an RSS processor will fail on.

    That page is a listing of RSS feeds, not an RSS feed itself.

    The first link is http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml: try passing that to your RSS processor.