phpperformancelintphabricatorarcanist

Why is arc lint so slow?


I am using arcanist and with a large number of linters both built-in and custom. As we add more, it's becoming increasingly slow.

For a beefy change with maybe an eslint expection, time arc lint shows it can take up to 30 minutes like so:

$ time arc lint
<...>
real    8m31.771s
user    17m53.159s
sys     4m52.329s

But on a clean repo with no changes, its fast

$ time arc lint
 OKAY  No lint warnings.

real    0m7.961s
user    0m6.763s
sys     0m1.363s

To figure out which linters are running slowly and should be optimized, I'd like to get more granular information about the runtime of each individual linter.

Currently

$ arc lint
Linting....
No Errors

Ideal state

$ arc lint
Linting eslint...
Elapsed time 18m38.311s
Linting with pylint...
Elapsed time 1m35.334s
Linting with local/
<...>
No Errors

So, how can I get more granular information from each individual arcanist linter? (And otherwise, any tips and tricks for improving the run speed of arc lint?)


Solution

  • To give an accurate answer as to why it's taking up to 30 minutes, the question lacks the following information.

    In general, linting can just get very expensive to execute if these parameters are all high (or low in case of versions). It's definitely possible to reach 30 minutes in extreme cases even without other performance issues.

    PHP is also not the best optimized language for this kind of workload, although that difference got a lot thinner in recent versions. Still it's a language tailor made for processing HTTP requests. Linting large amounts of source code is quite a different workload.

    For most languages, you should get much better performance by using the most common linter for that specific language.

    Looking at some of the source code of the linting engine, most lines date from 8 to 12 years back. You can imagine in that time many better performing linters have been written.

    In fact, if I follow the link on the repo's main README, the Phacility site displays a red warning saying

    Effective June 1, 2021: Phabricator is no longer actively maintained.

    They explain in more detail here (linked from that warning). If your current setup heavily depends on it, you may want to consider reducing that dependency.

    That probably explains at least part of the performance problems.

    Finding the slow part

    I couldn't find any command option that has the output you want. On this Debian manpage for arc I did find you can specify the path. So you can loop each directory in your sources root and time it separately.

    for dir in /source/*; do
      time arc lint "$dir"
    done
    

    You can then further pin down the directory that is the slowest.

    Another thing you can try is disabling all rulesets, and re-enabling them one by one, keeping track of the time. At first it will take almost no time, so it should be relatively fast to locate the first rule that starts adding a substantial amount of time.