phpdebuggingabstract-syntax-treephp-parser

How to get variable name and value in AST


I'm using PHP-PArser to find the AST of PHP program. For example:

code

<?php
use PhpParser\Error;
use PhpParser\NodeDumper;
use PhpParser\ParserFactory;

$code = <<<'CODE'
<?php
$variable = $_POST['first'];
$new = $nonexist; 
CODE;

$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
try {
    $ast = $parser->parse($code);
} catch (Error $error) {
    echo "Parse error: {$error->getMessage()}\n";
    return;
}

$dumper = new NodeDumper;
echo $dumper->dump($ast) . "\n";

The AST result of the above example as following:

array( 0: Stmt_Expression( expr: Expr_Assign( var: Expr_Variable( name: variable ) expr: Expr_ArrayDimFetch( var: Expr_Variable( name: _POST_first_symbol ) dim: Scalar_String( value: first ) ) ) ) 1: Stmt_Expression( expr: Expr_Assign( var: Expr_Variable( name: new ) expr: Expr_Variable( name: nonexist ) ) ) )

What I'm trying to find is the variable = _POST AND new = nonexist I used leavenode function to reach _POST and variable. my code to find _POSTand variable as following:

public function leaveNode(Node $node)
    {
        $collect_to_print= array();

        if ($node instanceof ArrayDimFetch
            && $node->var instanceof Variable
            && $node->var->name === '_POST')
        {
            $variableName = (string) $node->var->name;
            $collect_to_print[$node->dim->value] = $node->var->name; // will store the variables in array in a way to print them all later such as variable => _POST , how to get the name `variable` in this case
            return $node;
        }
        else
            if ($node instanceof Variable
        && !($node->var->name === '_POST' ))
        {
            $collect_to_print[$node->name] = 'Empty' ;
        }

    }

My results until now show every variable in separate line as following:

variable => 
first => _POST  // This _POST should be the value of variable (above)
new => Empty
nonexist => Empty

However, I expect the result to be:

variable => _POST
new => Empty
nonexist => Empty

any help please


Solution

  • This is a lot more complicated than other questions you've asked, but it has been interesting to learn about how to write it.

    I've put comments through the code, but basically it analyses the code and looks for assignments (instances of PhpParser\Node\Expr\Assign nodes). It then splits it into left and right parts and recursively extracts any variables in either parts.

    The code allows for nested variables on either side of the expression, I've changed the example code to provide some broader examples.

    Comments in code (assumes some knowledge of how the parser works with nodes etc.)...

    $traverser = new NodeTraverser;
    
    class ExtractVars extends NodeVisitorAbstract {
        private $prettyPrinter = null;
    
        private $variables = [];
        private $expressions = [];
    
        public function __construct() {
            $this->prettyPrinter = new PhpParser\PrettyPrinter\Standard;
        }
    
        public function leaveNode(Node $node) {
            if ( $node instanceof PhpParser\Node\Expr\Assign  ) {
                    $assignVars = $this->extractVarRefs ( $node->var );
                    // Get string of what assigned to actually is
                    $assign = $this->prettyPrinter->prettyPrintExpr($node->var);
                    // Store the variables associated with the left hand side
                    $this->expressions[$assign]["to"] = $assignVars;
                    // Store variables from right
                    $this->expressions[$assign][] = $this->extractVarRefs ( $node->expr );
             }
        }
    
        private function extractVarRefs ( Node $node ) : array  {
            $variableList = [];
            // If it's a variable, store the name
            if ( $node instanceof PhpParser\Node\Expr\Variable )   {
                $variable = $this->prettyPrinter->prettyPrintExpr($node);
                $this->variables[] = $variable;
                $variableList[] = $variable;
            }
            // Look for any further variables in the node
            foreach ( $node->getSubNodeNames() as $newNodeName )   {
                $newNode = $node->$newNodeName;
                if ( $newNode instanceof Node && $newNode->getSubNodeNames())   {
                    // Recursive call to extract variables
                    $toAdd = $this->extractVarRefs ( $newNode );
                    // Add new list to current list
                    $variableList = array_merge($variableList, $toAdd);
                }
            }
            return $variableList;
        }
    
        public function getVariables() : array  {
            return array_unique($this->variables);
        }
    
        public function getExpressions() : array    {
            return $this->expressions;
        }
    
    }
    
    $varExtract = new ExtractVars();
    $traverser->addVisitor ($varExtract);
    
    $traverser->traverse($ast);
    
    print_r($varExtract->getVariables());
    
    print_r($varExtract->getExpressions());
    

    Which gives the list of variables as...

    Array
    (
        [0] => $_POST
        [1] => $b
        [3] => $new
        [4] => $nonexist
    )
    

    And the list of expressions as

    Array
    (
        [$_POST[$b]] => Array
            (
                [to] => Array
                    (
                        [0] => $_POST
                        [1] => $b
                    )
    
                [0] => Array
                    (
                        [0] => $_POST
                    )
    
            )
    
        [$new] => Array
            (
                [to] => Array
                    (
                        [0] => $new
                    )
    
                [0] => Array
                    (
                        [0] => $nonexist
                    )
    
                [1] => Array
                    (
                        [0] => $_POST
                        [1] => $b
                    )
    
            )
    
    )
    

    note that the [to] element of the array contains any variables involved on the left of the =.