androidpdf

Android- Get text from PDF


I want to read text from a PDF file present in SD card.How can we get text from a PDF file which is stored in sd card?

I tried like:

public class MainActivity extends ActionBarActivity implements TextToSpeech.OnInitListener {

    private TextToSpeech tts;
    private String line = null;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);

        final TextView text1 = (TextView) findViewById(R.id.textView1);

        findViewById(R.id.button1).setOnClickListener(new OnClickListener() {

            private String[] arr;

            @Override
            public void onClick(View v) {
                File sdcard = Environment.getExternalStorageDirectory();

                // Get the text file

                File file = new File(sdcard, "test.pdf");

                // ob.pathh
                // Read text from file

                StringBuilder text = new StringBuilder();
                try {
                    BufferedReader br = new BufferedReader(new                            FileReader(file));

                    // int i=0;
                    List<String> lines = new ArrayList<String>();

                    while ((line = br.readLine()) != null) {
                        lines.add(line);
                        // arr[i]=line;
                        // i++;
                        text.append(line);
                        text.append('\n');
                    }
                    for (String string : lines) {
                        tts.speak(string, TextToSpeech.SUCCESS, null);
                    }
                    arr = lines.toArray(new String[lines.size()]);
                    System.out.println(arr.length);
                    text1.setText(text);

                } catch (Exception e) {
                    e.printStackTrace();
                }

            }
        });

    }

    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = tts.setLanguage(Locale.US);
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS", "This Language is not supported");
            } else {
                // speakOut();
            }

        } else {
            Log.e("TTS", "Initilization Failed!");
        }
    }

}

Note: It's working fine if the file is text file (test.txt) but not working for pdf (test.pdf)

But here the text is not getting from PDF as it is, it's getting like byte code. How can I achieve this?

Thanks in advance.


Solution

  • I have got the solution with iText.

    Gradle,

    compile 'com.itextpdf:itextg:5.5.10'
    

    Java,

      try {
                String parsedText="";
                PdfReader reader = new PdfReader(yourPdfPath);
                int n = reader.getNumberOfPages();
                for (int i = 0; i <n ; i++) {
                    parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"\n"; //Extracting the content from the different pages
                }
                System.out.println(parsedText);
                reader.close();
            } catch (Exception e) {
                System.out.println(e);
            }