map vs flatMap in reactor

I've found a lot of answers regarding RxJava, but I want to understand how it works in Reactor.

My current understanding is very vague, i tend to think of map as being synchronous and flatMap to be asynchronous but I can't really get my had around it.

Here is an example:

files.flatMap { it ->
    Mono.just(Paths.get(UPLOAD_ROOT, it.filename()).toFile())
        .map {destFile ->
            destFile.createNewFile()
            destFile    
        }               
        .flatMap(it::transferTo)
}.then()

I have files (a Flux<FilePart>) and i want to copy it to some UPLOAD_ROOT on the server.

This example is taken from a book.

I can change all the .map to .flatMap and vice versa and everything still works. I wonder what the difference is.

Solution

map is for synchronous, non-blocking, 1-to-1 transformations
flatMap is for asynchronous (non-blocking) 1-to-N transformations

The difference is visible in the method signature:

map takes a Function<T, U> and returns a Flux<U>
flatMap takes a Function<T, Publisher<V>> and returns a Flux<V>

That's the major hint: you can pass a Function<T, Publisher<V>> to a map, but it wouldn't know what to do with the Publishers, and that would result in a Flux<Publisher<V>>, a sequence of inert publishers.

On the other hand, flatMap expects a Publisher<V> for each T. It knows what to do with it: subscribe to it and propagate its elements in the output sequence. As a result, the return type is Flux<V>: flatMap will flatten each inner Publisher<V> into the output sequence of all the Vs.

About the 1-N aspect:

for each <T> input element, flatMap maps it to a Publisher<V>. In some cases (eg. an HTTP request), that publisher will emit only one item, in which case we're pretty close to an async map.

But that's the degenerate case. The generic case is that a Publisher can emit multiple elements, and flatMap works just as well.

For an example, imagine you have a reactive database and you flatMap from a sequence of user IDs, with a request that returns a user's set of Badge. You end up with a single Flux<Badge> of all the badges of all these users.

Is map really synchronous and non-blocking?

Yes: it is synchronous in the way the operator applies it (a simple method call, and then the operator emits the result) and non-blocking in the sense that the function itself shouldn't block the operator calling it. In other terms it shouldn't introduce latency. That's because a Flux is still asynchronous as a whole. If it blocks mid-sequence, it will impact the rest of the Flux processing, or even other Flux.

If your map function is blocking/introduces latency but cannot be converted to return a Publisher, consider publishOn/subscribeOn to offset that blocking work on a separate thread.