scalafunctional-programmingscalazscala-cats

Do cats and scalaz create performance overhead on application?


I know it is totally a nonsense question but due to my illiteracy on programming skill this question came to my mind. Cats and scalaz are used so that we can code in Scala similar to Haskell/in pure functional programming way. But for achieving this we need to add those libraries additionally with our projects. Eventually for using these we need to wrap our codes with their objects and functions. It is something adding extra codes and dependencies. I don't know whether these create larger objects in memory. These is making me think about. So my question: will I face any performance issue like more memory consumption if I use cats/scalaz ? Or should I avoid these if my application needs performance?


Solution

  • Do cats and scalaz create performance overhead on application?

    Absolutely.

    In the same way that any line of code adds performance overhead.
    So, if that is your concern, then don't write any code (well, actually, the world may be simpler if we had never tried all this).

    Now, dick answer outside. The proper question you should be asking is: "Is the overhead of X library harmful to my software?"; remember this applies to any library, actually to any code you write, to any algorithm you pick, etc.

    And, to answer that question, we need some things before.

    1. Define the SLOs the software you are writing must hold. Without those, any performance question / observation you made is pointless. It doesn't matter if something is faster / slower if you don't know if that is meaningful for you and your clients.
    2. Once you have SLOs, you need to perform stress tests to verify if your current version of the software satisfies those. Because, if your current code is performant enough, then you should worry about other things like maintainability, testing, adding more features, etc.
      PS: Remember that those SLOs should not be raw numbers but be expressed in terms of percentiles, the same goes for the results of the tests.
    3. When you find that you are failing your SLOs. Then, you need to do proper benchmarking and debugging to identify the bottlenecks of your project. As you saw, caring about performance must be done on each line of code, but that is a lot of work that usually doesn't produce any relevant output. Thus, instead of evaluating the performance of everything, we find the bottlenecks first, those small pieces of the app that have the biggest contributions to the overall performance of your software (remember the Pareto principle).
      Remember that in this step, we have to be integral, the network matters too. (and you will see this last one is usually the biggest slowdown; thus, you would usually search for architectural solutions like Fibers instead of Threads rather than trying to optimize small functions. Also, sometimes the easier and cheaper solution is better infrastructure).
    4. When you find the bottleneck, then you need to formulate some alternatives, implement those, and not only benchmark them but do Statistical hypothesis testing to validate if the proposed changes are worth it or not. And, of course, validate if they were enough to satisfy the SLOs.

    Thus, as you can see, performance is an art and a lot of work. So, unless you are committed to doing all this then stop worrying about something you will not measure and optimize properly. Rather, focus on increasing the maintainability of your code. This actually also helps performance, because when you find that you need to change something you would be grateful that the code is as clean as possible and that the whole architecture of the code allows for an easy change.

    And, believe me when I say that, using tools like cats, cats-effect, fs2, etc will help with that regard. Also, they are actually pretty optimized on their core so you should be good for a lot of use cases.


    Now, the big exception is that if you know that the work you are doing will be very CPU and memory bound then yeah, you pretty much can be sure all those abstractions will be harmful. In those cases, you may even want to stay away from the JVM and rather write pretty low-level code in a language like Rust which will provide you with proper tools for that kind of problem and still be way safer than plain old C.