iospdfcgpdfscanner

Spaces are not detected while scanning PDF - iOS (CGPDFScanner)


I am working on pdf scanning,where I want to extract text from the PDF. I am using pdf Multithreading.pdf for searching. I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ. What can be the problem?

Thanks


Solution

  • I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ.

    The reasons are that in your sample document

    1. no spaces are used in the text drawing operations but instead the text drawing position is changed using Tm operations; and
    2. only Tj text drawing operations are used, no TJ ones.

    E.g. the text drawing operations of the title page

    title on the title page

    are:

    BT
    /F0 50 Tf
    1 0 0 1 60 669.225 Tm
    (\0006)Tj                                    %  T
    1 0 0 1 83.527 669.225 Tm
    (\000J\000T)Tj                               %  hr
    1 0 0 1 125.631 669.225 Tm
    (\000G\000C\000F\000K\000P\000I)Tj           %  eading
    1 0 0 1 273.395 669.225 Tm
    (\0002)Tj                                    %  P
    1 0 0 1 298.272 669.225 Tm
    (\000T)Tj                                    %  r
    1 0 0 1 313.599 669.225 Tm
    (\000Q)Tj                                    %  o
    1 0 0 1 340.076 669.225 Tm
    (\000I\000T)Tj                               %  gr
    1 0 0 1 382.43 669.225 Tm
    (\000C\000O\000O\000K\000P\000I)Tj           %  amming
    0 Tc
    1 0 0 1 60 609.225 Tm
    (\000\))Tj                                   %  G
    1 0 0 1 91.7 609.225 Tm
    (\000W\000K\000F\000G)Tj                     %  uide
    ET  
    

    No white space in the Tj text drawing operations, only shifts in the drawing position using Tm.