Update:
When using NSXMLParser
class method initWithContentsOfURL
, rather than parsing as the XML feed is downloaded, it appears to try to load the entire XML file into memory, and only then initiate the parsing process. This is problematic if the XML feed is large (using an excessive amount of RAM, inherently inefficient because rather than parsing in parallel with the download, it only starts the parsing once the download is done, etc.).
Has anyone discovered how to parse as the feed is being streamed to the device using NSXMLParser
? Yes, you can use LibXML2
(as discussed below), but it seems like it should be possible to do it with NSXMLParser
. But it's eluding me.
Original question:
I was wrestling with using NSXMLParser
to read XML from a web stream. If you use initWithContentsOfURL
, while the interface may lead one to infer that it would stream the XML from the web, it doesn't seem to to do so, but rather appears to attempt to load the entire XML file first before any parsing taking place. For modest sized XML files that's fine, but for really large ones, that's problematic.
I have seen discussions of using NSXMLParser
in conjunction with initWithStream
with some customized NSInputStream
that is streaming from the web. For example, there have been answers to this that suggest using something like the CFStreamCreateBoundPair
referred to in the following Cocoa Builder post and the discussion of Setting Up Socket Streams in the Apple Stream Programming Guide, but I have not gotten it to work. I even tried writing my own subclassed NSInputStream
that used a NSURLConnection
(which is, itself, pretty good at streaming) but I wasn't able to get it to work in conjunction with NSXMLParser
.
In the end, I decided to use LibXML2
rather than NSXMLParser
, as demonstrated in the Apple XMLPerformance sample, but I was wondering if anyone had any luck getting streaming from a web source working with NSXMLParser
. I've seen plenty of "theoretically you could do x" sort of answers, suggesting everything from CFStreamCreateBoundPair
to grabbing the HTTPBodyStream
from NSURLRequest
, but I've yet to come across a working demonstration of streaming with NSXMLParser
.
The Ray Wenderlich article How To Choose The Best XML Parser for Your iPhone Project seems to confirm that NSXMLParser
is not well suited for large XML files, but with all of the posts about possible NSXMLParser
-based work-arounds for streaming really large XML files, I'm surprised I have yet to find a working demonstration of this. Does anyone know of a functioning NSXMLParser
implementation that streams from the web? Clearly, I can just stick with LibXML2
or some other equivalent XML parser, but the notion of streaming with NSXMLParser
seems tantilizingly close.
-[NSXMLParser initWithStream:]
is the only interface to NSXMLParser
that currently performs a streaming parse of the data. Hooking it up to an asynchronous NSURLConnection
that's providing data incrementally is unwieldy because NSXMLParser
takes a blocking, "pull"-based approach to reading from the NSInputStream
. That is, -[NSXMLParser parse]
does something like the following when dealing with an NSInputStream
:
while (1) {
NSInteger length = [stream read:buffer maxLength:maxLength];
if (!length)
break;
// Parse data …
}
In order to incrementally provide data to this parser a custom NSInputStream
subclass is needed that funnels data received by the NSURLConnectionDelegate
calls on a background queue or runloop over to the -read:maxLength:
call that NSXMLParser
is waiting on.
A proof-of-concept implementation follows:
#include <Foundation/Foundation.h>
@interface ReceivedDataStream : NSInputStream <NSURLConnectionDelegate>
@property (retain) NSURLConnection *connection;
@property (retain) NSMutableArray *bufferedData;
@property (assign, getter=isFinished) BOOL finished;
@property (retain) dispatch_semaphore_t semaphore;
@end
@implementation ReceivedDataStream
- (id)initWithContentsOfURL:(NSURL *)url
{
if (!(self = [super init]))
return nil;
NSURLRequest *request = [NSURLRequest requestWithURL:url];
self.connection = [[[NSURLConnection alloc] initWithRequest:request delegate:self startImmediately:NO] autorelease];
self.connection.delegateQueue = [[[NSOperationQueue alloc] init] autorelease];
self.bufferedData = [NSMutableArray array];
self.semaphore = dispatch_semaphore_create(0);
return self;
}
- (void)dealloc
{
self.connection = nil;
self.bufferedData = nil;
self.semaphore = nil;
[super dealloc];
}
- (BOOL)hasBufferedData
{
@synchronized (self) { return self.bufferedData.count > 0; }
}
#pragma mark - NSInputStream overrides
- (void)open
{
NSLog(@"open");
[self.connection start];
}
- (void)close
{
NSLog(@"close");
[self.connection cancel];
}
- (NSInteger)read:(uint8_t *)buffer maxLength:(NSUInteger)maxLength
{
NSLog(@"read:%p maxLength:%ld", buffer, maxLength);
if (self.isFinished && !self.hasBufferedData)
return 0;
if (!self.hasBufferedData)
dispatch_semaphore_wait(self.semaphore, DISPATCH_TIME_FOREVER);
NSAssert(self.isFinished || self.hasBufferedData, @"Was woken without new information");
if (self.isFinished && !self.hasBufferedData)
return 0;
NSData *data = nil;
@synchronized (self) {
data = [[self.bufferedData[0] retain] autorelease];
[self.bufferedData removeObjectAtIndex:0];
if (data.length > maxLength) {
NSData *remainingData = [NSData dataWithBytes:data.bytes + maxLength length:data.length - maxLength];
[self.bufferedData insertObject:remainingData atIndex:0];
}
}
NSUInteger copiedLength = MIN([data length], maxLength);
memcpy(buffer, [data bytes], copiedLength);
return copiedLength;
}
#pragma mark - NSURLConnetionDelegate methods
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
{
NSLog(@"connection:%@ didReceiveData:…", connection);
@synchronized (self) {
[self.bufferedData addObject:data];
}
dispatch_semaphore_signal(self.semaphore);
}
- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
NSLog(@"connectionDidFinishLoading:%@", connection);
self.finished = YES;
dispatch_semaphore_signal(self.semaphore);
}
@end
@interface ParserDelegate : NSObject <NSXMLParserDelegate>
@end
@implementation ParserDelegate
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
NSLog(@"parser:%@ didStartElement:%@ namespaceURI:%@ qualifiedName:%@ attributes:%@", parser, elementName, namespaceURI, qualifiedName, attributeDict);
}
- (void)parserDidEndDocument:(NSXMLParser *)parser
{
NSLog(@"parserDidEndDocument:%@", parser);
CFRunLoopStop(CFRunLoopGetCurrent());
}
@end
int main(int argc, char **argv)
{
@autoreleasepool {
NSURL *url = [NSURL URLWithString:@"http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xml"];
ReceivedDataStream *stream = [[ReceivedDataStream alloc] initWithContentsOfURL:url];
NSXMLParser *parser = [[NSXMLParser alloc] initWithStream:stream];
parser.delegate = [[[ParserDelegate alloc] init] autorelease];
[parser performSelector:@selector(parse) withObject:nil afterDelay:0.0];
CFRunLoopRun();
}
return 0;
}