OK, so I'm parsing through the PDF content stream, discovered that the TJ callback produces an array of strings, so I grab it and start iterating through it to get the string values like so:
static void Op_TJ(CGPDFScannerRef s, void *info)
{
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(s, &array);
if(success) {
NSMutableString *actualString = [[NSMutableString alloc] init];
NSLog(@"array count:%zu",CGPDFArrayGetCount(array));
for(size_t i = 0; i < CGPDFArrayGetCount(array); i++) {
CGPDFStringRef string;
CGPDFArrayGetString(array, i, &string);
NSString *stringData = (NSString *)CGPDFStringCopyTextString(string);
[actualString appendString:stringData];
NSLog(@"string Data:%@",stringData);
}
NSLog(@"actual string:%@",actualString);
}
}
Only problem is, this is my output:
2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] began text object
2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] array count:7
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ls
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] actual string:InInititiaials
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] ended text object
I've resorted to exiting the for loop if i
equals a number divisible by 2, but this is extremely sloppy and seems inefficient, so I'm wondering if anyone has a solution or any idea what the problem might be... I've tried multiple PDF files with the same results.
My simple quick fix was to change the for loop from this:
for(int i = 0; i < CGPDFArrayGetCount(array); i++)
to this:
for(int i = 0; i < CGPDFArrayGetCount(array); i+=2)
CGPDFArrayGetString is defined to return a BOOL that's true if there is a PDF string at the specified index, otherwise false.
You're not checking the return value!
My guess is than one time every two you don't have a PDF string (and function returns false).
In those cases the function doesn't overwrite the string variable that remains the same as the previous cycle.
Just a guess..