So I think this is my last Hpple question! I have found an entry in the HTML doc that I am parsing with Hpple. I have tried many different queries, but no luck. Here is a sample of the HTML.
I can get the text staring with "Today's project" with //div[@class = 'entry-content']/p. I can also get the next tag with //div[@class = 'entry-content']//a[@title]//* along with all the text after it. However, as you can see there is still some text after "/span". However, nothing that I have tried will work. I have tried looking at the children of the element, tried //div[@class = 'entry-content']/p//text(), //div[@class = 'entry-content']/p//following::*, nothing works. If anyone has any ideas, I am all ears!!! Thanks again for all of your time.
EDIT #1 As I try different things I was looking at the HTML. Under the p tag is the text I need, "Today's project..." then there is a span changing the text color and including a link, followed by more text. What I need to do is jump over that span to continue reading the text. Maybe my question should be, how do you jump over a span? Thanks for looking.
EDIT #2 Well, I am going to start a bounty on this one. I really need some help. I have looked everywhere and have tried a ton of different things. But nothing is working for me. I can not get the text after that one closed span. And this format appears often. The author of the blog I am parsing this for the App sometimes changes the style of her words and I can not get the text after she changes the style. Any help would be appreciated. Thanks again for looking.
EDIT #3 Here is another screen shot of the DOM tree HTML. If you can notice I am parsing the div class "entry content" The text in question is exposed. Starts with "Today..." then the span to change the color of the text, I can get that text. It is the text after that, that I need, " It was one....." right before the close p tag.
I also placed the entire HTML on gist. HERE. The line in question is 102. Although the HTML did not copy that nicely. Thanks.
Make some changes in the code to get further on the hierarchy and it worked on your html sample. Note: I'm appending all the entry-content in a single NSMutableString to make it easier. Like I warned you in the comment, use it with caution. :-)
NSString *filePath = [[NSBundle mainBundle] pathForResource:@"test" ofType:@"html"];
NSData *data = [NSData dataWithContentsOfFile:filePath];
TFHpple *detailParser = [TFHpple hppleWithHTMLData:data];
NSString *xpathQueryString = @"//div[@class='entry-content']";
NSArray *node = [detailParser searchWithXPathQuery:xpathQueryString];
NSMutableString *test = [[NSMutableString alloc] initWithString:@""];
for (TFHppleElement *element in node) {
for (TFHppleElement *child in element.children) {
if (child.content != nil) {
[test appendString:child.content];
}
if ([child.children count]!= 0) {
for (TFHppleElement *grandchild in child.children) {
if (grandchild.content != nil) {
[test appendString:grandchild.content];
}
for (TFHppleElement *greatgrandchild in grandchild.children) {
if (greatgrandchild.content != nil) {
[test appendString:greatgrandchild.content];
}
for (TFHppleElement *greatgreatgrandchild in greatgrandchild.children) {
if (greatgreatgrandchild.text != nil) {
[test appendString:greatgreatgrandchild.text];
}
if (greatgreatgrandchild.content != nil) {
[test appendString:greatgreatgrandchild.content];
}
}
}
}
}
}
}
NSLog(@"test = %@", test);