phplaravel-5pdf-readerpdfparser

read string by white spaces in php


i an trying to read a PDF with this library \Smalot\PdfParser\Parser(); in laravel 5.6

I am getting all content ok, but i have this:

Array
(
    [0] =>  MARTIN CARRILLO MARIA ESMERALDA ALHAMBRA 10 958 54 38 93
    [1] =>  ESPIGARES DIAS JOSE ANTONIO ALHAMBRA 11 958 54 33 32
    [2] =>  GUTIERREZ TITOS JOSE MANUEL ALHAMBRA 12 958 54 04 10
    [3] =>  MARTIN COBOS ANTONIO ALHAMBRA 18 958 54 33 28
    [4] =>  GOMEZ CARRILLO JOSE ALHAMBRA 20 958 54 32 72
    [5] =>  RODRIGUEZ RUANO BUENAVENTURA ALHAMBRA 21 958 54 35 86
    [6] =>  GARCIA ARIAS MARIA ISABEL ALHAMBRA 22 958 54 07 87
    [7] =>  RODRIGUEZ JIMENEZ MIGUEL ALHAMBRA 27 958 54 08 77
    [8] =>  GUTIERREZ FERNANDEZ AMANDA HILDA ALHAMBRA 3 958 49 98 62
    [9] =>  DIAZ FLORIAN DOLORES ALHAMBRA 30 958 54 33 99
    [10] =>  PEREZ LASTRA SAUL ADAN ALHAMBRA 32 958 54 31 46
    [11] =>  AMIGO MOLINA MARIA MATILDE ALHAMBRA 35 958 54 31 91
    [12] =>  ESPIGARES GOMEZ JORGE ALHAMBRA 37 958 54 02 22
    [13] =>  ESPIGARES ESPIGARES JOSE ALHAMBRA 40 958 54 31 83

but I need to get the second surname, surname, name, address and phone from these strings... I think that I can get all data count white spaces.

I am trying this:

// loop to get data line
foreach($dataByLine as &$line){      
    // delete first element in string
    $cleanString = substr($line, 0, 7);
    $line = str_replace($cleanString, "", $line);   
    
    $line = preg_replace('/^[\d]+/','',$line);
    //$line = ltrim($line,"0123456789");
    $stringDivide = preg_split('/\s+/', $line, -1, PREG_SPLIT_NO_EMPTY);

    
    //array_push($secondSurnameArray, $stringDivide[0]);

    array_push($dataList, $line);
}

print_r($dataList);

but this return:

Array
(
    [0] => MARTIN
    [1] => CARRILLO
    [2] => MARIA
    [3] => ESMERALDA
    [4] => ALHAMBRA
    [5] => 10
    [6] => 958
    [7] => 54
    [8] => 38
    [9] => 93
)
Array
(
    [0] => ESPIGARES
    [1] => DIAS
    [2] => JOSE
    [3] => ANTONIO
    [4] => ALHAMBRA
    [5] => 11
    [6] => 958
    [7] => 54
    [8] => 33
    [9] => 32
)
Array
(
    [0] => GUTIERREZ
    [1] => TITOS
    [2] => JOSE
    [3] => MANUEL
    [4] => ALHAMBRA
    [5] => 12
    [6] => 958
    [7] => 54
    [8] => 04
    [9] => 10
)

for example, but if i do this:

array_push($secondSurnameArray, $stringDivide[0]);

this return that index 0 it´s not defined.

I need this data to create one array to create insert query.

Thanks for help me, read me and sorry for my english.

update

i hope this:

$secondSurname = ['MARTIN', 'ESPIGARES', 'GUTIERREZ'....etc]
$surname = ['CARRILLO', 'DIAS', 'TITOS', .....etc]
$name = ['JOSE', 'JOSE', 'ANTONIO']
$addres = ['ALHAMBRA 10', 'ALHAMBRA 11', 'ALHAMBRA 12', etc]
$phone = ['958 54 38 93', '958 54 33 32', '958 54 04 10']

after i need to do array merge with $secondSurname, $surname and $name to do insert query


Solution

  • I assume you are trying to loose the additional surname if there are 2 so thats easily done in the loop.

    Also merging up the parts that make up the phone number can simply be done there as well.

    $dataByLine = [ 
        'MARTIN CARRILLO MARIA ESMERALDA ALHAMBRA 10 958 54 38 93',
        'ESPIGARES DIAS JOSE ANTONIO ALHAMBRA 11 958 54 33 32',
        'GUTIERREZ TITOS JOSE MANUEL ALHAMBRA 12 958 54 04 10',
        'MARTIN COBOS ANTONIO ALHAMBRA 18 958 54 33 28',
        'GOMEZ CARRILLO JOSE ALHAMBRA 20 958 54 32 72',
        'RODRIGUEZ RUANO BUENAVENTURA ALHAMBRA 21 958 54 35 86',
        'GARCIA ARIAS MARIA ISABEL ALHAMBRA 22 958 54 07 87',
        'RODRIGUEZ JIMENEZ MIGUEL ALHAMBRA 27 958 54 08 77',
        'GUTIERREZ FERNANDEZ AMANDA HILDA ALHAMBRA 3 958 49 98 62',
        'DIAZ FLORIAN DOLORES ALHAMBRA 30 958 54 33 99',
        'PEREZ LASTRA SAUL ADAN ALHAMBRA 32 958 54 31 46',
        'AMIGO MOLINA MARIA MATILDE ALHAMBRA 35 958 54 31 91',
        'ESPIGARES GOMEZ JORGE ALHAMBRA 37 958 54 02 22',
        'ESPIGARES ESPIGARES JOSE ALHAMBRA 40 958 54 31 83'
    ];
    
    $secondSurname  = [];
    $surname        = [];
    $name           = [];
    $address        = [];
    $phone          = [];
    
    foreach($dataByLine as $line){      
        $t = explode(' ', $line);   # make array from string using space as seperator
        // fix double surname issue
        if ( count($t) == 10 ){
            // we have a line with a double surname
            array_shift($t);   # drop the first name in list
        }
        $secondSurname[]    = $t[0];
        $surname[]          = $t[1];
        $name[]             = $t[2];
        $address[]          = $t[3];
        $phone[]            = $t[4] . $t[5] . $t[6] . $t[7] . $t[8]; 
        # you can add space and or `-` into the phone if you like
    }
    
    echo 'SecondSurnames ';
    print_r($secondSurname);
    echo 'Surnames ';
    print_r($surname);
    echo 'names ';
    print_r($name);
    echo 'address ';
    print_r($address);
    echo 'phone ';
    print_r($phone);
    
    

    RESULTS

    SecondSurnames Array
    (
        [0] => CARRILLO
        [1] => DIAS
        [2] => TITOS
        [3] => MARTIN
        [4] => GOMEZ
        [5] => RODRIGUEZ
        [6] => ARIAS
        [7] => RODRIGUEZ
        [8] => FERNANDEZ
        [9] => DIAZ
        [10] => LASTRA
        [11] => MOLINA
        [12] => ESPIGARES
        [13] => ESPIGARES
    )
    Surnames Array
    (
        [0] => MARIA
        [1] => JOSE
        [2] => JOSE
        [3] => COBOS
        [4] => CARRILLO
        [5] => RUANO
        [6] => MARIA
        [7] => JIMENEZ
        [8] => AMANDA
        [9] => FLORIAN
        [10] => SAUL
        [11] => MARIA
        [12] => GOMEZ
        [13] => ESPIGARES
    )
    names Array
    (
        [0] => ESMERALDA
        [1] => ANTONIO
        [2] => MANUEL
        [3] => ANTONIO
        [4] => JOSE
        [5] => BUENAVENTURA
        [6] => ISABEL
        [7] => MIGUEL
        [8] => HILDA
        [9] => DOLORES
        [10] => ADAN
        [11] => MATILDE
        [12] => JORGE
        [13] => JOSE
    )
    address Array
    (
        [0] => ALHAMBRA
        [1] => ALHAMBRA
        [2] => ALHAMBRA
        [3] => ALHAMBRA
        [4] => ALHAMBRA
        [5] => ALHAMBRA
        [6] => ALHAMBRA
        [7] => ALHAMBRA
        [8] => ALHAMBRA
        [9] => ALHAMBRA
        [10] => ALHAMBRA
        [11] => ALHAMBRA
        [12] => ALHAMBRA
        [13] => ALHAMBRA
    )
    phone Array
    (
        [0] => 10958543893
        [1] => 11958543332
        [2] => 12958540410
        [3] => 18958543328
        [4] => 20958543272
        [5] => 21958543586
        [6] => 22958540787
        [7] => 27958540877
        [8] => 3958499862
        [9] => 30958543399
        [10] => 32958543146
        [11] => 35958543191
        [12] => 37958540222
        [13] => 40958543183
    )
    

    NOTE: I am not sure these seperate arrays are really the best solution to be passed to your query binding, which is why I asked to see the query. It may make more sence to have a single array with one occurance for each person.