phpregexserializationpreg-match-allrecursive-regex

`preg_match_all()` yields incorrect/null results


The question

Why do I get the wrong results? How can I get the desired results?

The input file

vulture (wing)
tabulations: one leg; two legs; flying
father; master; patriarch
mat (box)
pedistal; blockade; pilar
animal belly (oval)
old style: naval
jackal's belly; jester    slope of hill (arch)
key; visible; enlightened

The two versions of the code, with issue

Version 1:

<?php
$filename = "fakexample.txt";
$file = fopen($filename, "rb");
$myFile = fread($file, filesize($filename));
//PLEASE NOTE THAT $STARTSAFE IS IN PLACE IN THIS VERSION AND NOT IN THE NEXT
function get_between($startString, $endString, $myFile, $startSafe, $endSafe){
  //CHANGES WILL START HERE
  if($startSafe = 0){
    $startStringSafe = $startString;
  }
  elseif($startSafe = 1){
    $startStringSafe = preg_quote($startString, '/');
  }
  //AND END HERE
  if($endSafe = 0){
    $endStringSafe = $endString;
  }
  elseif($endSafe = 1){
    $endStringSafe = preg_quote($endString, '/');
  }
  //non-greedy match any character between start and end strings. 
  //s modifier should make it also match newlines.
  preg_match_all("/$startStringSafe(.*?)$endStringSafe/m", $myFile, $matches);
  return $matches;
}
$list = get_between("^", ")", $myFile, 0, 1);
foreach($list[1] as $list){
  echo $list."\n";
}
?>

This code returns nothing.

Version 2:

<?php
$filename = "fakexample.txt";
$file = fopen($filename, "rb");
$myFile = fread($file, filesize($filename));
//PLEASE NOTE THAT $STARTSAFE IS NOT IN PLACE IN THIS VERSION
function get_between($startString, $endString, $myFile, $startSafe, $endSafe){
  //CHANGES START HERE

    $startStringSafe = $startString;



  //AND END HERE
  if($endSafe = 0){
    $endStringSafe = $endString;
  }
  elseif($endSafe = 1){
    $endStringSafe = preg_quote($endString, '/');
  }
  //non-greedy match any character between start and end strings. 
  //s modifier should make it also match newlines.
  preg_match_all("/$startStringSafe(.*?)$endStringSafe/m", $myFile, $matches);
  return $matches;
}
$list = get_between("^", ")", $myFile, 0, 1);
foreach($list[1] as $list){
  echo $list."\n";
}
?>

This returns vulture (wing mat (box animal belly (oval jackal's belly; jester slope of hill (arch.

The desired output

vulture mat animal belly jackal's belly; jester slope of hill

Marked differences

vulture **(wing** mat **(box** animal belly **(oval** jackal's belly; jester slope of hill **(arch**


Solution

  • ^ means start of subject per default. If you want it to mean start of line, then you will have to use the /m modifier in your regex wrapper.