Adi Polak

Director of Advocacy and Developer Experience Engineering, Confluent

Adi is an experienced Software Engineer and people manager. For most of her professional life, she has worked with data and machine learning for operations and analytics. As a data practitioner, she developed algorithms to solve real-world problems using machine learning techniques and leveraging expertise in Apache Spark, Kafka, HDFS, and distributed large-scale systems.

Adi has taught Spark to thousands of students and is the author of the successful book — Scaling Machine Learning with Spark. Earlier this year, she embarked on a new adventure with data streaming, specifically Flink, and she can't get enough of it.

Presentations

Have you ever asked an AI language model like ChatGPT about the latest developments on a certain topic, only to receive this response:” I'm sorry, but as of my last knowledge update in January 2022, I don't have information on the topic at hand..“If you have, you've encountered a fundamental limitation of large language models. You can think about these models as time capsules of knowledge frozen at the point of their last training. They can only learn new information by going through a retraining process, which is both time-consuming and computationally intensive.

In the fast-paced world of artificial intelligence, a new technology is emerging to tackle this challenge — Retrieval-Augmented Generation, or RAG. This innovative approach is revolutionizing how language models operate, breaking down barriers and opening up new possibilities.But what exactly is RAG? Why is it important? And how does it work? All and more in this talk.

In the ever-evolving landscape of technology and Generative AI, integrating DevOps principles into the machine learning (ML) lifecycle is a transformative game-changer.

Join me for an insightful session where we will explore essential aspects such as mlflow, deployment patterns, and monitoring techniques for ML models. Gain a deeper understanding of how to effectively navigate the complexities of deploying ML models into production environments. Discover best practices and proven strategies for monitoring and observing ML models in real-world scenarios.

By attending this session, you will acquire valuable insights and practical knowledge to overcome the unique hurdles of scaling and bringing AI into production. Unlock the full potential of your ML models by embracing the powerful integration of DevOps principles. This presentation is based on the extensive customer research I conducted to write the Best Seller book - Scaling Machine Learning with Spark - https://www.amazon.com/Scaling-Machine-Learning-Spark-Distributed/dp/1098106822.

Designing a distributed system architecture can be a daunting task, with contradictory requirements and constraints constantly at play. The CAP theorem that directly states the challenges in distributed data stores presents a classic example where developers must choose between consistency, availability, and partition tolerance. The same applies to streaming infrastructure systems, where optimizing for one aspect can come at the cost of another. With cost, throughput, accuracy, and latency as the main constraints for streaming systems, it's crucial to make informed decisions that align with your business goals.

In this session, you'll gain valuable insights into how your system design choices impact your system overall capabilities. You'll also learn about the differences between Flink Streaming and Spark Streaming, both conceptually and in practice. Lastly, you'll understand how combining multiple solutions can be beneficial for your team and business. Join to learn more about the cumbersome world of distributed stream processing systems.