databricksspark-structured-streaming

Fixed interval micro-batches and AvailableNow Trigger


What is the fundamental difference between "Fixed interval micro-batches" and "AvailableNow" Trigger ?

I find the documentation around those confusing.

Is the fundamental difference the fact that AvailableNow shut down when finished and Fixed interval micro-batches never shut down ?

Indeed as far as i am understanding the documentation, AvailableNow does not mean, one micro-batch of everything available, but depending on the size set up, might mean consuming multiple micro-batch up to what was available when the job was triggered. Am I understanding this correctly ?

The other thing i find confusing in the documentation is the idea that the micro-batch size is set up by a property of type maxbytePerTrigger (depending on the data source). If AvailableNow represent one trigger then that is a problem. So does AvailableNow actually means multiple triggers ?


Solution

  • The AvailableNow trigger will process all available data in the source when the query starts. It can process all of that available data in multiple micro-batches using whatever stream configurations you have, such as maxBytesPerTrigger. Once it finishes processing all of that data, it will exit, and that streaming query will no longer be running on your cluster.

    The Fixed interval micro-batch trigger will run a single micro-batch every interval that you specify. Each micro-batch will respect your stream configurations like maxBytesPerTrigger. Unlike AvailableNow, this trigger will not ever exit on its own. It will keep running until you manually stop it (via query.stop()) or it encounters an exception.

    AvailableNow is useful if you want to incrementally process your source on a one-off basis. Let's say you have some data in S3, and every now and then you want to reprocess the new data. You can spin up a query with AvailableNow, it'll process all the data, and exit. But if you want more real-time processing, you can use a fixed interval trigger.