Suppose you wanted to run CodeQL analysis across a large set of repos that contain Java code.
Some repos have mvn/ant/gradle setup properly to build their code.. CodeQL results look solid + works right next to their existing build steps.
There are other repos that just have pom.xml
files (30+ within a single repo) that are not building / no build tools and just .java
files laying around directories.. In the situation where CodeQL either cannot autobuild or the team cannot provide the required build steps
In this situation I'm wondering where nothing is being analyzed (no files going through compilation) if there's anything wrong with the brute force approach below:
find . -name "*.java" -print | xargs javac
This approach immediately solves the issues listed above with having 30+ pom files and some of them breaking + Java repos with no build tools. All files are "covered" and results are found/reported in the Security tab within the repos
I was initially stunned this worked (files are all scanned + findings are showing up properly in the sarif file)
My question is: Is there any reason why I wouldnt do this for all java repos by default?
Feels very gross to have every call to javac
error early but it still be enough for it to be added to the CodeQL database. I havent seen anyone else talk about this approach to forcing CodeQL to analyze all files in a repo but the same approach worked with Kotlin/DotNet where we had similar issues with build steps being clearly defined and working.
This method is awesome and I rolled it out for all compiled languages (DotNet, Java, Kotlin, Go. The big gotcha here is you will miss files when the devs use spaces in their filenames (easy sed
to the rescue for that one to not make the compilers freak out and miss that file)
Naive approach > no approach at all. If there is no build defined in a given repo I'll 100% take a classic SAST scan to run and have the team deal with the typical false positives from a security tool being dumb