htmliosparsinghpple

Hpple, getting text after </span>


So I think this is my last Hpple question! I have found an entry in the HTML doc that I am parsing with Hpple. I have tried many different queries, but no luck. Here is a sample of the HTML.HTML

I can get the text staring with "Today's project" with //div[@class = 'entry-content']/p. I can also get the next tag with //div[@class = 'entry-content']//a[@title]//* along with all the text after it. However, as you can see there is still some text after "/span". However, nothing that I have tried will work. I have tried looking at the children of the element, tried //div[@class = 'entry-content']/p//text(), //div[@class = 'entry-content']/p//following::*, nothing works. If anyone has any ideas, I am all ears!!! Thanks again for all of your time.

EDIT #1 As I try different things I was looking at the HTML. Under the p tag is the text I need, "Today's project..." then there is a span changing the text color and including a link, followed by more text. What I need to do is jump over that span to continue reading the text. Maybe my question should be, how do you jump over a span? Thanks for looking.

EDIT #2 Well, I am going to start a bounty on this one. I really need some help. I have looked everywhere and have tried a ton of different things. But nothing is working for me. I can not get the text after that one closed span. And this format appears often. The author of the blog I am parsing this for the App sometimes changes the style of her words and I can not get the text after she changes the style. Any help would be appreciated. Thanks again for looking.

EDIT #3 Here is another screen shot of the DOM tree HTML. If you can notice I am parsing the div class "entry content" The text in question is exposed. Starts with "Today..." then the span to change the color of the text, I can get that text. It is the text after that, that I need, " It was one....." right before the close p tag.

Dom Tree

I also placed the entire HTML on gist. HERE. The line in question is 102. Although the HTML did not copy that nicely. Thanks.


Solution

  • Make some changes in the code to get further on the hierarchy and it worked on your html sample. Note: I'm appending all the entry-content in a single NSMutableString to make it easier. Like I warned you in the comment, use it with caution. :-)

    NSString *filePath = [[NSBundle mainBundle] pathForResource:@"test" ofType:@"html"]; 
    NSData *data = [NSData dataWithContentsOfFile:filePath];
    TFHpple *detailParser = [TFHpple hppleWithHTMLData:data];
    NSString *xpathQueryString = @"//div[@class='entry-content']";
    NSArray *node = [detailParser searchWithXPathQuery:xpathQueryString];
    
    NSMutableString *test = [[NSMutableString alloc] initWithString:@""];
    
    for (TFHppleElement *element in node) {
        for (TFHppleElement *child in element.children) {            
            if (child.content != nil) {
                [test appendString:child.content];
            }
            if ([child.children count]!= 0) {
                for (TFHppleElement *grandchild in child.children) {
                    if (grandchild.content != nil) {
                        [test appendString:grandchild.content];
                     }
                    for (TFHppleElement *greatgrandchild in grandchild.children) {
                        if (greatgrandchild.content != nil) {
                            [test appendString:greatgrandchild.content];
                        }
                        for (TFHppleElement *greatgreatgrandchild in greatgrandchild.children) {
                            if (greatgreatgrandchild.text != nil) {
                                [test appendString:greatgreatgrandchild.text];
                            }
                            if (greatgreatgrandchild.content != nil) {
                                [test appendString:greatgreatgrandchild.content];
                            }
                        }
                    }
                }
            }
        }
    }
    
    NSLog(@"test = %@", test);