phpregexparsingphp-parser

php get contents of function - regexp possible?


I have a regex for finding all function definitions. What I want to do now is to get also the contents in the functions e.g. as third field in $matches is that possible using regex or do I need some push-pop machine because of the nesting of {} brackets? What I want to do is a script which analyzes php code and figures out which functions have dependencies. If there is already a script let me know it!

$content = file_get_contents($fileName);
preg_match_all("/(function )(\w+\(.*?\))/", $content, $matches);

I don't want to use php-tokenizer because it figures out also some "hidden-functions" like predefined functions and that stuff, but I want just the functions written in code.


Solution

  • Even if for better or worse you're not Noam Chomsky, you should understand this:

    PHP is not a regular language, so cannot be expressed or parsed by regular expressions.

    To be a regular language, a language needs to be, among other things, context free.

    language hierarchies

    "Context free" means that a "word" in the language means the same thing regardless of where it occurs. This is not the case for PHP. In fact, even your simple snippet to find function signatures already crashes and burns here:

    // function foo()
    

    The context of a comment voids this function keyword of its usual meaning. Not to mention:

    'function foo()';
    <<<HERE
        function foo()
    HERE;
    

    and a host of similar examples. The function keyword (and everything else too) is dependent on context, making PHP a context-sensitive language, thereby not regular, thereby not feasibly parseable by regular expressions.

    Use a parser.