clangclang-static-analyzer

Clang sa checker for get/set functions with reference types ("location cannot be a NonLoc")


I'm writing a clang static analyzer checker for a pair of functions that save the passed argument value and return it:

   void set(const int& value);
   const int& get();

Real set implementation saves passed value to an internal variable of type int, get returns the reference to this variable.

This is my implementation:

#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"

using namespace clang;
using namespace ento;

namespace {

class checkerTest : public Checker<eval::Call> {

  bool handleSet(CheckerContext &C, const CallEvent &Call) const;
  bool handleGet(CheckerContext &C, const CallEvent &Call) const;

public:
  bool evalCall(const CallEvent &Call, CheckerContext &C) const;

  using FnHandler = bool (checkerTest::*)(CheckerContext &, const CallEvent &Call) const;
  CallDescriptionMap<FnHandler> Functions = {
      {{{"set"}, 1}, &checkerTest::handleSet},
      {{{"get"}, 0}, &checkerTest::handleGet},
  };
};

} // namespace

SVal g_value;

bool checkerTest::handleSet(CheckerContext &C, const CallEvent &Call) const {
  SVal location = Call.getArgSVal(0);
  QualType LoadTy = Call.getArgExpr(0)->getType();
  ProgramStateRef State = C.getState();
  SVal Value = State->getSVal(location.castAs<Loc>(), LoadTy);
  C.addTransition(State);
  g_value = Value;
  return true;
}

bool checkerTest::handleGet(CheckerContext &C, const CallEvent &Call) const {
  ProgramStateRef State = C.getState();
  State = State->BindExpr(Call.getOriginExpr(), C.getLocationContext(), g_value);
  C.addTransition(State);
  return true;
}

bool checkerTest::evalCall(const CallEvent &Call, CheckerContext &C) const {
    const FnHandler *Handler = Functions.lookup(Call);
    if (Handler) {
        return (this->**Handler)(C, Call);
    }
    return false;
}

void ento::registercheckerTest(CheckerManager &mgr) {
  mgr.registerChecker<checkerTest>();
}

bool ento::shouldRegistercheckerTest(const CheckerManager &mgr) {
  if (mgr.getLangOpts().CPlusPlus)
    return true;
  return false;
}

Set handler gets the argument value and saves it to a global variable, get binds this value with the call expression.

I know that the right way to save g_value is saving it to a special map but for testing purposes use a global variable.

On this simple test

void set(const int& value);
const int& get();

int main()
{
    set(0);
    int res = get();
    return res;
}

clang crashes due to an assert in evalLoad

Assertion `!isa<NonLoc>(location) && "location cannot be a NonLoc."' failed.

I think that the engine expects the return value to be a reference, location of a symbol, and g_value is the value itself.

How to get the location of the returned value and return it?

I used clang-16.


Solution

  • The assertion is triggered because the line:

      State = State->BindExpr(Call.getOriginExpr(), C.getLocationContext(), g_value);
    

    binds an integer value (zero) to the location associated with a reference, namely the return value of get(). ExprEngine sees that the call to get() is subject to lvalue-to-rvalue conversion and therefore expects that, if it is bound, it should be bound to a semantic value that represents a location (not an integer), since lvalues are represented by locations.

    How to get the location of the returned value and return it?

    The location of the returned value should be a MemRegionVal referring to a SymbolicRegion. This way it names an abstract location representing whatever the returned reference refers to.

    The symbol, represented by a SymExpr, could be a fresh ("conjured") one for each call site, and that is the simplest approach, but means that the analysis will treat each call as returning a reference to a different memory location, which might not be what you want.

    Nevertheless, for the simple approach, code to do it is:

      // Get the type, preserving reference-ness.
      QualType resultType = Call.getResultType();
    
      // Conjure a symbolic memory region to represent what the reference
      // points to.
      SVal getReferentVal =
        C.getSValBuilder().getConjuredHeapSymbolVal(
          expr,
          C.getLocationContext(),
          resultType,
          C.blockCount());
    
      ProgramStateRef State = C.getState();
    
      // Bind the call expression to the fresh location.
      State = State->BindExpr(expr, C.getLocationContext(), getReferentVal);
    
      // Bind the location to the previously saved value.
      State = State->bindLoc(getReferentVal.castAs<Loc>(), g_value, C.getLocationContext());
    
      // Update the abstract state with those new bindings.
      C.addTransition(State);
    

    This uses getConjuredHeapSymbolVal to obtain a fresh symbolic memory region that (as it happens) is assumed to be somewhere on the heap.

    I'm using the four-argument overload since the three-argument form would fail an assertion related to the type of the expression. The getResultType() call ensures resultType is a ReferenceType, whereas the reference-ness is lost when calling expr->getType().

    Then, we bind the call expression to the conjured location, and in turn bind the location to the value that was previously stashed in g_value. Consequently, it can be retrieved from State by calling getSVal on the expression, and then again on the returned location.

    To improve the accuracy, you probably want to create a fresh location only if the expression is not already bound to one, since that will model the case where multiple calls to get() return the same thing.