I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:
NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
NSString *postid = [element content];
if (postid) {
[texts appendString:postid];
}
}
This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text? The piratebay is fomratted like so:
<pre>
<a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>
That's an easy job and you did it almost correctly!
What you want is the content (or an attribute) of the a
-tag, so you need to tell the parser that you want it.
Just change your XPath
to
@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"
(You missed the a
at the very end and you do not need node()
)
Output:
http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png
If you only want the screenshot URLs you can do something like
NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
[screenshotURLs addObject:nodes[i]];
}