This is a story about real pain and the maturation of a logging system. We’ll examine how the lack of standards breaks observability and why Kubernetes became the point of no return, forcing us to rethink our entire logging approach. We’ll walk through the requirements and architectural decisions that helped us regain control. I’ll share hands-on experience in building controlled, production-grade log pipelines - without magic and without “silver bullet” tools. This is an honest story from real production environments.
Olexandr Shevchenko
(DevOps Engineer, ONSEO),This talk demonstrates practical approaches to unified observability, where metrics, logs, traces, and profiles are integrated for rapid diagnostics in distributed systems. We will cover data correlation techniques using trace IDs and labels to enable instant navigation from errors to specific spans, setting up continuous profiling for preview environments, using flame charts for performance analysis, and leveraging dependency maps and service graphs to visualize architecture. Special attention is given to AI-specific aspects: applying AI assistants to automate root cause analysis and implementing AI Evals for systematic evaluation of the quality, correctness, and reliability of AI systems.
Denys Vasyliev
(Principal Site Reliability Engineer / UK Global Talent Visa Holder),We are used to trusting our feelings: it seems that the processes are working and the product is of high quality. But feelings cannot be scaled. In this report, I will show how we moved from intuitive decisions to a system of metrics that measures the quality of products and processes in real time. How teams, with a “dashboard,” manage the development of their products in terms of quality. And most importantly, how technical metrics become understandable to the business, help to talk about risks in one language, and make decisions on a large scale.
Igor Drozd
(CTO, Silpo(E-commerce)),I will talk about the validation and monitoring of AI agents using the example of a mobile application that interacts with a multi-agent system via OpenAPI. I will demonstrate practical approaches to testing agent logic, methods for collecting performance metrics, and setting up an observability system. I will share my experience in tracking agent behavior in real time, detecting anomalies, and ensuring the reliability of a multi-agent architecture in production.
Oleksandr Denisyuk
(R&D manager at MODUS X),The session covers architectural strategies that keep Elasticsearch stable under heavy load: proper index and shard design, ILM policies, persistent queues in Logstash, metrics downsampling, and self-monitoring of the observability stack. Insights are drawn from operating platforms that handle terabytes of logs and millions of events per day.
Anton Pryhodko
(EPAM, Systems Architect),
Kostiantyn Sharovarskyi
(Jooble),