I have been trying to parse a XHTML
doc via TouchXML
, but it always can't find any tags via XPath query
.
Below is the XHTML:
XHTML <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
<p>
<a href="http://www.flickr.com/photos/55397648@N00/5987335786/"
title="casavermeer5.jpg by the style files, on Flickr">
<img src="http://farm7.static.flickr.com/6127/5987335786_abec990554_o.jpg"
width="500" height="750" border="0" alt="casavermeer5.jpg" />
</a>
</p>
</body>
</html>
So, we can see there are a "p" tag, "a" tag and "img" tag
What I did then is shown as the code below:
CXHTMLDocument *doc = [[[CXHTMLDocument alloc] initWithXHTMLString:XHTML options:0 error:&error] autorelease];
NSLog(@"error %@", [error localizedDescription]);
NSLog(@"doc children count = %d", [doc childCount]);
NSArray *imgNodeArray = [doc nodesForXPath:@"//img" error:&error];
NSLog(@"imgNodeArray = %d", [imgNodeArray count]);
NSLog(@"error %@", [error localizedDescription]);
The results are
error (null)
doc children count = 2
imgNodeArray = 0
error (null)
So, there are no error at all in parsing the XHTML
doc and no error for the XPath query
. Also this doc has two children under the root ("body" tag and "head" tag). But the problem is it cannot find the "img" tag. I have tried to replace "img" with other possible tag names (such as p, a, even body, head), no luck at all.
Can someone help me here?
P.S.
Actually the original doc is a HTML, I have used CTidy class in TouchXML lib to tidy the HTML to XHTML first. The XHTML above came from that CTidy results.
I also tried to add a namespace thing to the XPath query, like this
NSMutableDictionary *namespaceDict = [NSMutableDictionary dictionary];
[namespaceDict setValue:@"http://www.w3.org/1999/xhtml" forKey:@"xhtml"];
And change the XPath query to
NSArray *imgNodeArray = [doc nodesForXPath:@"//xhtml:img" namespaceMappings:namespaceDict error:&error];
Still no luck, can't find any results.
Try this //img
.
When you use //
it gets the img
tag, no matter where it is in the page.
It is better than //xhtml:img
- because sometimes the hierarchic tags change a bit in the code behind, so it is better to be global, and not too much specific.