iosswiftperformancegenericsexistential-type

Should using "any" be avoided when optimising code in Swift?


I was reading the differences between using some and any and I often read that using any causes performance issues. The authors suggest always using some unless using any is absolutely necessary.

I encountered a situation where applying this rule of "avoid using existential types if possible" made my code look strange. Consider the following example:

protocol DogDelegate {
    func didBark()
    func willBark()
}

struct Dog<D: DogDelegate> {
    var delegate: D?  // << Making this `(any DogDelegate)?` would look so much better

    func bark() {
        delegate?.willBark()
        print("Woof!")
        delegate?.didBark()
    }
}

In my app, I have a delegate-like property which allows optional customisation of a function. It sort of allows the caller to "modify" the barking a little bit.

If I wanted to make my code as fast as possible, would using generics be preferred in this case? Or is the performance drawback so small that even if I had a million Dogs in an array while using any it wouldn't cause a significant performance drop?


Solution

  • tl;dr: for parameters to a method, prefer taking some over any as there's no downside or loss of generality; for property, there are definitely applications to sticking with any and the impetus to switch is context-dependent and subject to real-world measurement to offset the change in the code.


    To understand why you might want to use some Protocol over any Protocol, or vice versa, it helps to understand what any and some mean under the hood.

    By way of analogy, given that you have a protocol P, you can imagine any P and some P as being two different types of boxes:

    1. any P is an opaque box, which can hold inside of it anything that looks like a P. Because it's opaque, you can't quite tell what's inside unless you walk over to it and open it up, to inspect the contents (but on the upside, you can also swap out the contents without anyone noticing!)
    2. some P, on the other hand, is a transparent (almost imaginary) box which can hold only one specific type of P-shaped item at a time, with the benefit that you can see what's actually inside the box from across the room — no need to inspect it directly (downside: you can't change what's inside without changing the shape of the box, too)

    In many ways, working with these two types of boxes is pretty similar: at the end of the day, they're both meant to hold something that looks like a P, and that's their main purpose. The difference between them is the transparency.


    As boxes, there are two main places that any and some can be used:

    1. As parameters to methods
    2. As stored values / properties

    In the case of (1), you're looking at the difference between

    func ƒ1(_ p: any P) {
        // Do `P` type things with `p`.
    }
    

    and

    func ƒ2(_ p: some P) {
        // Do `P` type things with `p`.
    }
    

    The way an individual value gets used within ƒ1 and ƒ2 is identical, with no discernible effect to the user of p. (Though there is an effective difference, which I'll get to below.)

    In the case of (2), there are significant differences between holding on to an any P and a some P:

    1. Storing some P requires you to make the outer type generic. This can add a lot of additional code to accommodate, and requires all consumers of the type to then also deal with a generic type, possibly propagating that genericity upwards too
    2. You cannot store different types of P values inside of some P, meaning that you may not be able to reassign the property and must instead re-instantiate an entire new value altogether (e.g., in your example, a brand new Dog<D> for every different type of D delegate)

    So, why use one over the other?

    1. As a parameter, there's little reason to take any P over some P these days. Prior to SE-0352 Implicitly Opened Existentials, it used to be that the world of existentials (any P) and generics (some P) were very different, and entirely separate: if you had an existential type, you couldn't, for example, use it to call a generic method. Both existentials and generics were "viral" in a sense: if you had an existential type, you couldn't use it with generic methods, which meant you'd need to write methods which took existentials; and if you wanted to write methods which took advantage of generics, you had to then propagate genericity "upwards" to preserve the types everywhere

      No longer. With SE-0352, you can now pass an existential to a generic method, and Swift will "open" the existential box for you, to give you access to the real type in a way that a generic method can accept. Because of this, there's now little reason to not write generic methods, because they can accept both existentials and non-existentials

    2. As a property, there's absolutely a reason to prefer any P over some P in at least one case: your property needs to be able to hold one of multiple types of Ps, and you don't know which at compile time. This is the primary reason for existentials to exist: they are an abstraction over the general concept of P, and are general enough to be able to hold any P inside of them

      Your delegate example is exactly a case where you likely would want to write any DogDelegate: Dog doesn't actually care what the actual type of delegate is, just that it can do DogDelegate things. It makes life significantly easier to be able to assign one of several different DogDelegate objects to your Dog, without having to make any changes to Dog itself

    Typically, the switch from any to some in the case of (1) just requires replacing the keyword; in the case of (2), there's a significant difference between them, and you need to evaluate what makes sense for your code in context.

    Performance

    To get a bit more technical, there are two performance characteristics of any and some to keep in mind, given the choices above:

    1. When people refer to existential "boxes", they are literally referring to an object in memory which holds inside of it a reference to some other value of type P. The "box" is a reified value, which can respond to methods by passing them along to the underlying value — at the slight cost of additional indirection to do so. (e.g., an additional pointer access)

      The reason I referred to some P as an imaginary box above is that in reality, there is no box: some is a compile-time concept exclusively. some is a slightly nicer spelling for generics, such that

      func f<T: P>(_ p: T) { ... }
      

      and

      func g(_ p: some P) { ... }
      

      are identical. In f, you can tell that you can actually "see" the effective type T the method is being called with, whereas in g the actual type is slightly obscured, in case you don't really care about it. (It's made a bit more "opaque", and in fact, these are known as opaque parameter types — matching opaque result types)

      Because of this, some P has no overhead at all, because the compiler actually knows at compile time what the effective type really is: it's only obscured from you, the developer

    2. And because of this (that the compiler sees through some types), a generic method can be optimized more thoroughly by the compiler than one which takes an existential. How this manifests in practice depends very heavily on what the method actually does, so it's hard to generalize, but in some cases, the optimization may be significant

    I often read that using any causes performance issues. The authors suggest always using some unless using any is absolutely necessary.

    In practice, it's easy for folks to get a bit carried away with suggestions like this; you absolutely should not be terrified of using existential types, especially not to the point of rewriting your codebase to eliminate all anys. Nothing has effectively changed with the labeling of any and some, just made more explicit as to what the code might be doing — and we've used existentials just fine in Swift for a long time!

    The cost of using an existential in code over a generic type is likely marginal: the cost of a pointer indirection relative to what your program is doing is vanishingly small, but it is technically there. In some cases, there may be a performance benefit to switching over to a generic type because of more thorough compiler optimizations, but that's case-by-case basis territory.

    The main reason folks recommend switching over is that in many cases, you can simply swap any to some with no further work necessary, and not doing so leaves some amount of "free performance" on the table. What and how you make such changes is up to you — but the reason the comments above all recommend benchmarking is to show you that you can find out yourself, for your specific application, whether this makes any meaningful difference or not. I highly suspect that for the vast majority of applications out there, the difference is non-existent.