phppdfpdfparser

How to get the exact 14 characters in a string PHP


I have a problem when getting the string with exact 14 characters from the result of PDF Parser. Here is my code,

$parser = new Parser();
    $pdff    = $parser->parseFile($_FILES['file']['tmp_name']);

    $pages = count($pdff->getPages());

    // 14 characters for 2312024VJDU1WR

    $text = $pdff->getPages()[0]->getText();
    $text2 = "BUYERSELLER Return Attempt 1 2 Product Quantity: Weight: Order ID Delivery Attempt 1 2 Pittland 1 HOME 11- JOS_SP AA1 3 3 Central Ship By Pickup -1E-0 16 015- Roberts AIPMC Online South Luzon 4025 Bataan National High School Senior High Sc hool, Balanga City , Bataan, North Luzon Balanga City Bataan North Luzon 2100 RGC Compound, Canlubang Industrial Estate, Brgy . Pi ttland, Cabuyao Laguna, Philippines 4025, Cabuyao, L ShLCabuyao Laguna 2312024VJDU1WR 5,900 g Judith P110331EX6QAE COD Dec 05 Dec 02";

    $words = explode(" ", $text); // spaces are sometimes not count: L ShLCabuyao Laguna 2312024VJDU1WR 5,900
                    
    $threeCharacterWords = array_filter($words, function($word) {
        return strlen($word) == 14;
    });
    
    $resultString = implode(" ", $threeCharacterWords);
    echo $resultString; // no strings that had an exact 14 characters length

The variable $text2 is the sample result from PDF Parser. But when I apply the variable $text to this method $words = explode(" ", $text);, it fails to get this set of ID: 2312024VJDU1WR. I only needed the 2312024VJDU1WR.


Solution

  • It seems like your PDF parser may not be extracting the text as expected or there might be some formatting issues. Instead of relying on spaces, you can try using regular expressions to match the pattern you're looking for directly. Here's an updated, Hope this help you

    $parser = new Parser();
    $pdff = $parser->parseFile($_FILES['file']['tmp_name']);
    $text = $pdff->getPages()[0]->getText();
    
    // Define the pattern for matching 14 characters
    $pattern = '/\b([A-Za-z0-9]{14})\b/';
    
    // Find all matches in the text
    preg_match_all($pattern, $text, $matches);
    
    // $matches[1] will contain an array of all matched 14-character strings
    $resultString = implode(" ", $matches[1]);
    echo $resultString;
    

    This regular expression ('/\b([A-Za-z0-9]{14})\b/') looks for sequences of 14 characters (alphanumeric) and ensures they are whole words by using \b (word boundary) at the beginning and end. This should help you capture the desired strings even if there are variations in spacing or formatting.