pdfword-count

pdf word count after specific word


I have multiple pdf-files where I need to count the number of words after a specific title or word occurs in the text. E.g., the given title shows up at the top of the second page in a 2-page document, then only the words on page 2 are counted. Do you know if any of the existing word count programs have already included such a feature?

Many thanks for your help

Chris


Solution

  • If you are looking for command line automatiation then

    1. You need to convert PDF to text file first using pdftotext:

      $ pdftotext transcript.pdf

    2. Then use the wc utility to count words:

      $ wc -w transcript.txt