pdfitextextractbraille

Extract Braille text (image) from PDF using iTextSharp


Braille is a special font for blind people. I am trying to decode the text written in Braille font in a PDF file and output the normal text. But the PDFTextExtractor (in iTextSharp) cannot handle this font. Is it possible in any other way?

I am trying to figure out how can I decode from a pdf file.

I tried using,

PdfReader pdf = new PdfReader("C:\\pdfs\\file.pdf");
string text = PdfTextExtractor.GetTextFromPage(pdf, 1);

this.brailleTextBox.Text = text.ToString();
this.normalTextBox.Text = text.ToString();

on a pdf file having text in regular font (e.g Arial) and braille font but it doesnt returns the braille text and instead return just the normal text on the page.

How can I get the Braille Font text instead, using iTextSharp.


Solution

  • (not an answer yet)

    Okay, maybe I'm not understanding correctly. I just tried using the PdfTextExtractor on the PDF that you provided and it worked correctly. Specifically the following text was kicked out for page 1:

    B   r    a   i     l    l    e   C   o   d    e   s 
    B r a i l l e C o d e s 
    
    Embossed dot positions as,   
    
    
    A  B   C   D   E   F   G  H   I    J   K  
    A B C D E F G H I J K 
    L    M  N  O   P  Q   R  S   T   U   V  
    L M N O P Q R S T U V 
    W  X   Y   Z 
    W X Y Z 
    
    
    1   2   3    4   5   6    7   8   9   0 
    1 2 3 4 5 6 7 8 9 0
    

    I apologize if I'm misunderstanding you, but are you trying to get the text back as braille?