AI has already become part of the modern engineering landscape, but things are much more complicated in high-load production systems. GenAI works well in demos and copilots, but is it ready for real-time processing, heavy workloads, and critical production scenarios? In this panel discussion, we’ll talk about why AI has yet to become the standard for high-load architectures, where the line between ML and GenAI lies, why inference is expensive, and why FinOps is becoming a new headache for engineering teams. We’ll discuss on-prem vs. cloud for AI workloads and real-world production constraints.
Oleksandr Savchenko
(СТО в МінЦифри),Oleg Tsal-Tsalko
(CTO, EPAM),Anton Boyko
(BoykoAnt.PRO),Dmytro Nemesh
(Lalafo, CTO),Most systems start with push because it feels natural — the server knows when something changes, so it notifies clients immediately. But at scale, push embeds a hidden cost multiplier: connected users × open items × update rate. Every millisecond of freshness is paid for by every connected user, even those who aren't watching. This talk is based on a real-world production case study at DraftKings. We’ll explore how the data delivery model shapes the cost curve, rather than just the latency profile. We walk through: — the original push-based system and how it accumulated complexity over the years — the scaling pain points that made it unsustainable under peak load — the evaluation process that led to short polling as the answer — a zero-downtime migration strategy across four phases The results were counter-intuitive: a pull-based architecture with short polling outperformed push on data freshness at peak load, while achieving significant CPU and infrastructure cost reduction. The talk closes with a practical framework for deciding when push wins, when pull wins, and what question to ask first.
Artem Kuzmyk
(Software Architect, DraftKings Inc.),In this presentation, we will explore a practical use case of implementing effective infrastructure autoscaling using HPA, VPA, and Cluster Autoscaler. While working with standard VPA, we encountered several limitations, including a lack of flexibility in configuring calculation intervals and conflicts when running concurrently with HPA. Consequently, we decided to develop our own custom VPA controller. In our new solution, we: - Achieved stable coexistence of VPA and HPA on the same resources. - Implemented a filtering mechanism for transient CPU spikes during the pod startup phase. - Optimized the architecture by consolidating the functionality of three standard components into a single pod. - Leveraged the new In-Place Pod Resize capabilities introduced in Kubernetes 1.33. Key result: Optimized resource consumption and a 20–40% reduction in infrastructure costs.
Kostiantyn Tomakh
(DevOps Engineer, Uklon),In high-load systems, infrastructure costs often grow not because of the load itself, but because of inefficient architectural decisions: overprovisioning, excessive use of managed services, unnecessary data movement, incorrect SLA decisions, and the lack of a transparent cost model. At the conference, we will discuss how to approach cost optimization as a full-fledged architectural practice, rather than a one-time resource reduction exercise. We will also cover workload profile analysis, identifying real bottlenecks, building unit economics for infrastructure, traffic optimization, caching, CDN, observability, and controlled service degradation. Separately, we will look at the trade-off between performance, reliability, and cost: where maximum fault tolerance is truly required, where an eventually consistent approach is sufficient, and where managed solutions should be replaced with a simpler self-hosted architecture.
Ihor Zakutynskyi
(CTO, FORMA, Universe),In many systems, analytics are built directly into the backend: events, workers, and enrichment through dozens of database queries and calls to other services. In our case, a single analytics event generated up to 10 database queries, which, at a scale of millions of events, placed a significant load on production. In this talk, I’ll explain how we completely changed our approach: - we switched from application events to CDC via Debezium; - we started feeding changes from each table directly into the data pipeline; - we moved enrichment and aggregations to BigQuery; and we effectively removed the analytics load from backend services. As a result: - we eliminated millions of read queries to the production database; - reduced the complexity of the backend code; - separated OLTP from analytics; and made building analytics significantly faster. Let’s talk separately about a less obvious benefit: now, to build new analytical scenarios, all you need is one Data Engineer, an ERD diagram, and modern AI tools—without involving the backend team and without changes to the production code. We’ll also examine: - where CDC actually delivers value, and where it doesn’t; - what issues arise (lag, duplicates, schema changes); - how the system’s cost changes; and why “the same data” in the new architecture isn’t free.
Yozhef Hisem
(Solution Architect @ MacPaw),Most engineers eventually face the need to perform load testing: validating how a service scales, testing a new database, or running performance benchmarks for a new technology. At that point the obvious question arises — which tool should you use? Existing solutions work well for HTTP load testing, but they often become limiting when you need to test other protocols, model complex workload patterns (open vs closed systems, skewed distributions, hot partitions), or run distributed load testing in a cluster. In this talk, I will introduce NBomber — a load testing framework I created to address these challenges. We will cover: - why there was a need to build a new tool despite the existence of Gatling, Locust, and k6 - using .NET and F# to build latency-sensitive systems - the architecture of NBomber - how NBomber Cluster works - several practical use cases including database benchmarks, anomaly detection, Kubernetes integration, benchmark comparison, and performance trend analysis.
Anton Moldovan
(DraftKings & NBomber LLC),Low Latency in High-Load Systems: From Redis Pub/Sub to In-Memory Runtime — a Toy That Went Too Far In this talk, we will explore real-world experience in building a low-latency system for cross-exchange operations, where geography is just as important as algorithms. We will discuss why message brokers and classic microservices are not a good fit for HFT (High-Frequency Trading)-like scenarios, how in-memory state combined with regional runtime nodes provides predictable latency, and where the boundary lies between speed and consistency.
Dmytro Hnatiuk
(Senior Full Stack Developer at Everlabs),
Have you ever wondered what’s really going on under the hood of distributed systems? Not those “sort of a cluster” setups with 3 nodes, I mean the real deal. The exabyte-scale beasts. In this talk, we’ll peek behind the curtains of modern infrastructure. How do systems that crunch mountains of data actually work? What patterns, principles, and engineering decisions are hiding behind truly scalable architectures? Here’s what we’ll dive into: - What the inner life of a distributed system really looks like - How a distributed app is different from a distributed system - How data storage patterns evolve into modern DBs, queues, and logs - Why “PostgreSQL in the cloud” isn’t really PostgreSQL anymore - Why Northguard might just outshine Kafka - And how new players like NewSQL are changing the game If you’re an architect, tech lead, developer or just curious about why infrastructure scales the way it does - come join! I’ll share insights you might use in your own projects (or at least see them from a new angle). P.S. Yep, there’ll be a bit of magic ✨ and a whole lot of hard truths about the distributed systems powering our world. ?
Oleksii Petrov
(Solution Architect @ Husqvarna Group),JavaScript and Golang are two different worlds that often intersect in modern projects: the first dominates frontend development and rapid prototyping, while the second excels in high-load services and microservice architectures. In this talk, I will share my personal experience of moving from JS to Go, comparing approaches to asynchronous programming, application architecture, working with databases, and tooling. We’ll explore not only the differences but also the similarities that help developers adapt more easily between these ecosystems.
Valentyn Lapotkov
(StartupSoft, Senior Software Engineer),