phpregexserializationrecursive-regex

Recursive regex with garbled text surrounding? Getting "ArrayArray"


I asked a similar question, but it was closed for being too broad. Basically, I have a bunch of questions like this. I'm hoping just asking one will be easier. I've tried some different ways to solve this, but none of them actually work.

I have a text file with a lot of data. The only data that I'm interested in falls between two brackets, "(" ")". I'm wondering how to get each instance of info that lies between brackets into an array.

The code I'm using right now returns ArrayArray:

function get_between($startString, $endString, $myFile){
  preg_match_all('/\$startString([^$endString]+)\}/', $myFile, $matches);
  return $matches;
}
$myFile = file_get_contents('explode.txt');
$list = get_between("&nbsp(", ")", $myFile);
foreach($list as $list){
  echo $list;
}

Solution

  • <?php
    function get_between($startString, $endString, $myFile){
      //Escape start and end strings.
      $startStringSafe = preg_quote($startString, '/');
      $endStringSafe = preg_quote($endString, '/');
      //non-greedy match any character between start and end strings. 
      //s modifier should make it also match newlines.
      preg_match_all("/$startStringSafe(.*?)$endStringSafe/s", $myFile, $matches);
      return $matches;
    }
    $myFile = 'fkdhkvdf(mat(((ch1)vdsf b(match2) dhdughfdgs (match3)';
    $list = get_between("(", ")", $myFile);
    foreach($list[1] as $list){
      echo $list."\n";
    }
    

    I did this and it seems to work. (Obviously, you'll need to replace my $myFile assignment line with your file_get_contents statement.) A few things:

    A: Variable replacement won't occur with single-quotes. So your preg_replace_all regular expression won't work as a result. As it literally adds $startString to your expression instead of (. (I also removed the check for } at the end of the matched string. Add it back in if you need it with \\} just before the ending delimiter.)

    B: $list will be an array of arrays. I believe by default, index zero will contain all full matches. index one will contain the first subpattern match.

    C: This only works so long as $endString will not ever be found inside of a subpattern you are attempting to match. Say, if you expect (matc(fF)) to give you matc(fF), it won't. It'll give you match(fF. You'll need a more powerful parser if you want to get the former result in this case.

    Edit: The get_between function here should work with &nbsp;( and )} as well, or whatever else you'd want.