iosobjective-ccocoa-touchxsltlibxslt

libxslt: xml to html text encoding issue iOS


I am using xslt framework to convert xml to html for iOS project. I am sending the encoded xml to the xslt framework. But the output it gives is not encoded. So when i try to parse the html, i am getting the parser error.

NSString *xml = @"<div>a&lt;b</div>" // not exact this but its similar in encoding
NSData *xmlMem = [xml dataUsingEncoding:NSUTF8StringEncoding];

NSString* styleSheetPath = [[NSBundle mainBundle] pathForResource:fileName  ofType:fileExtension];
xmlDocPtr doc, res;
xsltStylesheetPtr sty;
xmlSubstituteEntitiesDefault(1);
xmlLoadExtDtdDefaultValue = 1;
sty = xsltParseStylesheetFile((const xmlChar *)[styleSheetPath cStringUsingEncoding: NSUTF8StringEncoding]);
doc = xmlParseMemory([xmlMem bytes], [xmlMem length]);
res = xsltApplyStylesheet(sty, doc, nil);
xmlChar* xmlResultBuffer = nil;
xsltSaveResultToString(&xmlResultBuffer, &length, res, sty);
NSString* resultHTML = [NSString stringWithCString: (char *)xmlResultBuffer encoding:NSUTF8StringEncoding];
NSLog(@"Result: %@", resultHTML);

Result: <div>a<b<div>

The result is not an encoded html. Could anyone help me to fix this issue?


Solution

  • The problem is the following: In the course of parsing a string of XML, any entity references are expanded, that is, replaced with the string value they reference.

    If your input XML contains entities such as &lt;, they will appear as < as soon as they are parsed - and before the XML can be processed.

    To avoid this, just replace & with its entity, too, that is, &amp;. Change

    NSString *xml = @"<div>a&lt;b</div>"
    

    to

    NSString *xml = @"<div>a&amp;lt;b</div>"
    

    Then, &amp;lt; is resolved to &lt; but no further replacement is applied, since it is not an iterative process.