javacodeql

Codeql extract local dataflow of a java method takes so long


I want to extract the local data flow of a Java method. So far I have this query to extract wherever a variable is accessed, declared, or assigned within the function:

/**
 * @name Empty block
 * @kind problem
 * @problem.severity warning
 * @id java/example/empty-block
 */


 import java
 import semmle.code.java.dataflow.DataFlow
 

 from File fl, LocalVariableDeclExpr ld, VarAccess va, Assignment asn
 where 
     fl.getBaseName() = "Calculator.java"
     and 
     ld.getEnclosingCallable().getName()= "calc"
     and va.getEnclosingCallable().getName() = "calc"
     and asn.getEnclosingCallable().getName() = "calc"
     
     
 and ld.getLocation().getFile() = fl 
 and va.getLocation().getFile() = fl 
 and asn.getLocation().getFile() = fl 
 and va.getLocation().getStartLine() = ld.getLocation().getStartLine()
 select ld, "\"" + va.getVariable().getName()+"\""  + "->" + "\"" +ld.getVariable().getName()+"\""  + "\n" + "\"" +asn.getDest()+"\""  + "->" + "\"" +asn.getSource()+"\""  + "\n"

The problem is it takes so long in the SELECT phase. I am using this repository as a database. The file name is Calculator.Java and this is the method:

public double calc(double x, String input, char opt) {
        inText.setFont(inText.getFont().deriveFont(Font.PLAIN));
        double y = Double.parseDouble(input);
        switch (opt) {
            case '+':
                return x + y;
            case '-':
                return x - y;
            case '*':
                return x * y;
            case '/':
                return x / y;
            case '%':
                return x % y;
            case '^':
                return Math.pow(x, y);
            default:
                inText.setFont(inText.getFont().deriveFont(Font.PLAIN));
                return y;
        }
    }

pic

Thanks.


Solution

  • Do you want declaration, assignment and read of the same variable? Because the way your query is currently written just selects combinations of any variable inside the calc method.

    Also, your query might not be very performant because you keep checking the callable name and file. It would probably be more performant (and also easier to read) to introduce a separate variable for that, see the query code further below.

    Keep in mind that CodeQL is a database language, so the intermediate result in your case is a tuple (LocalVariableDeclExpr, VarAccess, Assignment), this means:

    1. If a variable has no Assignment then the query has no result for that variable (note that initialization of local variables is not considered an Assignment, see GitHub issue)
    2. You get permutations of that tuple which are probably not interesting for you, e.g.: (Decl, Access1, Assign1), (Decl, Access1, Assign2), (Decl, Access2, Assign1), ...

    So maybe it would be more interesting to just for every variable get the VarAccess (which covers access for reading and writing to the variable), for example:

    import java
    
    from Method method, LocalVariableDeclExpr ld, VarAccess va
    where
      method.getDeclaringType().hasName("Calculator") and
      method.hasName("calc") and
      ld.getEnclosingCallable() = method and
      va.getEnclosingCallable() = method and
      // And both belong to the same variable
      ld.getVariable() = va.getVariable()
    select ld, va
    

    Also note that your query is not related to dataflow, it just finds declaration, assignment and usage of variables, which do not necessarily have to be in the right (or even any) order. See the documentation for more information about tracking dataflow. Maybe you are also interested in visualizing the dataflow with path queries. It is also important to consider the difference between dataflow and taint tracking. Dataflow only covers cases where the exact same value flows between variables and calls, whereas taint tracking also covers case where the value is converted or transformed, for example obtaining a substring from a string (see also the documentation).