Stream Processing Your Way with Apache Flink?

Apache Flink has become the standard piece of stream processing infrastructure for applications with difficulttosatisfy demands for scalability, high performance, and fault tolerance, all while managing large amounts of application state.

The key to demystifying Apache Flink is to understand how the combination of stream processing plus application state has influenced its design and APIs. A framework that cares only about batch processing, or one that performed only stateless stream processing, would be much simpler.

We'll explore how Flink's managed state is organized, and how this relates to the programming model exposed by its APIs. We'll look at checkpointing: how it works, the correctness guarantees that Flink offers, how state snapshots are organized on disk, and what happens during recovery and rescaling.

We'll also look at watermarking, which is a major source of complexity and confusion for new Flink developers. Watermarking epitomizes the requirement Flink has to manage application state in a way that doesn't explode as those applications run continuously on unbounded streams.

You'll leave with a good mental model of Apache Flink, ready to use it in your own stateful stream processing applications.


About Tim Berglund

Tim is a teacher, author, and technology leader with Confluent, where he serves as the Vice President of Developer Relations. He is a regular speaker at conferences and a presence on YouTube explaining complex technology topics in an accessible way. He tweets as @tlberglund, blogs every few years at http://timberglund.com. He has three grown children and two grandchildren, a fact about which he is rather excited.

More About Tim »