I am working on pdf scanning,where I want to extract text from the PDF. I am using pdf Multithreading.pdf for searching. I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ. What can be the problem?
Thanks
I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ.
The reasons are that in your sample document
E.g. the text drawing operations of the title page
are:
BT
/F0 50 Tf
1 0 0 1 60 669.225 Tm
(\0006)Tj % T
1 0 0 1 83.527 669.225 Tm
(\000J\000T)Tj % hr
1 0 0 1 125.631 669.225 Tm
(\000G\000C\000F\000K\000P\000I)Tj % eading
1 0 0 1 273.395 669.225 Tm
(\0002)Tj % P
1 0 0 1 298.272 669.225 Tm
(\000T)Tj % r
1 0 0 1 313.599 669.225 Tm
(\000Q)Tj % o
1 0 0 1 340.076 669.225 Tm
(\000I\000T)Tj % gr
1 0 0 1 382.43 669.225 Tm
(\000C\000O\000O\000K\000P\000I)Tj % amming
0 Tc
1 0 0 1 60 609.225 Tm
(\000\))Tj % G
1 0 0 1 91.7 609.225 Tm
(\000W\000K\000F\000G)Tj % uide
ET
No white space in the Tj text drawing operations, only shifts in the drawing position using Tm.