Kafka deployment at Scale
Criteo is a worldwide online advertising platform, building the next generation of digital advertising technologies. We manage billions of ad impressions, each one automatically personalized for its audience. This is truly big data and machine learning without the hyperbole!
We have been using Kafka in production for log aggregation and streaming for 4 years. Our data platform is spread over 8 data centers around the world, collecting and processing 7 million msg/sec and generating more than 150 TB of data each day.
We started to use Kafka as a big buffer between online and batching world, each having its own constraints. Despite Kafka high availability, our servers cannot afford to wait for unavailable partitions. We will present you the framework we built around Kafka, including a SDK, our own C# Kafka client and watermarks.
This talk is a deep dive into our critical data pipeline, the architectural choices we made, and tools we developed to maintain scalability, reliability, and performance. Take a look at our journey so far and the challenges we are facing now.
Nice to know prior to the talk: Basic experience with Apache Kafka.
The talk will be the most interesting for Data Engineers/SRE.
- Software Engineer at Criteo.
- More than 10 years of experience in the industry.
- Currently part of the SRE Kafka team in Criteo which builds Streaming Platform.
- Worked for Grammarly in the past. Likes JVM and functional programming. Fun of improving development productivity.