phpfull-text-searchtext-search

Search Text In Files Using PHP


How to search text in some files like PDF, doc, docs or txt using PHP? I want to do similar function as Full Text Search in MySQL, but this time, I'm directly search through files, not database.

The search will do searching in many files that located in a folder. Any suggestion, tips or solutions for this problem?

I also noticed that, google also do searching through the files.


Solution

  • For searching PDF's you'll need a program like pdftotext, which converts content from a pdf to text. For Word documents a simular thingy could be available (because of all the styling and encryption in Word files).

    An example to search through PDF's (copied from one of my scripts (it's a snippet, not the entire code, but it should give you some understanding) where I extract keywords and store matches in a PDF-results-array.):

    foreach($keywords as $keyword)
    {
        $keyword = strtolower($keyword);
        $file = ABSOLUTE_PATH_SITE."_uploaded/files/Transcripties/".$pdfFiles[$i];
    
        $content    = addslashes(shell_exec('/usr/bin/pdftotext \''.$file.'\' -'));
        $result     = substr_count(strtolower($content), $keyword);
    
        if($result > 0)
        {
            if(!in_array($pdfFiles[$i], $matchesOnPDF))
            {
                array_push($matchesOnPDF, array(                                                    
                        "matches"   => $result,
                        "type"      => "PDF",
                        "pdfFile"   => $pdfFiles[$i]));
            }
        }
    }