Filter by tag

High-load ≠ high-cost: how to optimize infrastructure without losing reliability [ukr]

In high-load systems, infrastructure costs often grow not because of the load itself, but because of inefficient architectural decisions: overprovisioning, excessive use of managed services, unnecessary data movement, incorrect SLA decisions, and the lack of a transparent cost model. At the conference, we will discuss how to approach cost optimization as a full-fledged architectural practice, rather than a one-time resource reduction exercise. We will also cover workload profile analysis, identifying real bottlenecks, building unit economics for infrastructure, traffic optimization, caching, CDN, observability, and controlled service degradation. Separately, we will look at the trade-off between performance, reliability, and cost: where maximum fault tolerance is truly required, where an eventually consistent approach is sufficient, and where managed solutions should be replaced with a simpler self-hosted architecture.

Ihor Zakutynskyi

(CTO, FORMA, Universe Group),
Highload fwdays'26 conference
From Logging Chaos to Controlled Pipelines [ukr]

This is a story about real pain and the maturation of a logging system. We’ll examine how the lack of standards breaks observability and why Kubernetes became the point of no return, forcing us to rethink our entire logging approach. We’ll walk through the requirements and architectural decisions that helped us regain control. I’ll share hands-on experience in building controlled, production-grade log pipelines - without magic and without “silver bullet” tools. This is an honest story from real production environments.

Olexandr Shevchenko

(DevOps Engineer, ONSEO),
DevOps fwdays'26 conference
How we spent 72 hours chasing 5 seconds [recorded talk]

This talk demonstrates practical approaches to unified observability, where metrics, logs, traces, and profiles are integrated for rapid diagnostics in distributed systems. We will cover data correlation techniques using trace IDs and labels to enable instant navigation from errors to specific spans, setting up continuous profiling for preview environments, using flame charts for performance analysis, and leveraging dependency maps and service graphs to visualize architecture. Special attention is given to AI-specific aspects: applying AI assistants to automate root cause analysis and implementing AI Evals for systematic evaluation of the quality, correctness, and reliability of AI systems.

Denys Vasyliev

(Principal Site Reliability Engineer / UK Global Talent Visa Holder),
DevOps fwdays'26 conference
Feelings versus facts: why metrics are more important than intuition [ukr]

We are used to trusting our feelings: it seems that the processes are working and the product is of high quality. But feelings cannot be scaled. In this report, I will show how we moved from intuitive decisions to a system of metrics that measures the quality of products and processes in real time. How teams, with a “dashboard,” manage the development of their products in terms of quality. And most importantly, how technical metrics become understandable to the business, help to talk about risks in one language, and make decisions on a large scale.

Igor Drozd

(CTO, Silpo(E-commerce)),
CTO fwdays'25 conference
Validation and Observability of AI Agents [ukr]

I will talk about the validation and monitoring of AI agents using the example of a mobile application that interacts with a multi-agent system via OpenAPI. I will demonstrate practical approaches to testing agent logic, methods for collecting performance metrics, and setting up an observability system. I will share my experience in tracking agent behavior in real time, detecting anomalies, and ensuring the reliability of a multi-agent architecture in production.

Oleksandr Denisyuk

(СТО в Укрпошта),
Fwdays+DevRain AI
Observability with Elasticsearch: Best Practices for High-Load Platform [ukr]

The session covers architectural strategies that keep Elasticsearch stable under heavy load: proper index and shard design, ILM policies, persistent queues in Logstash, metrics downsampling, and self-monitoring of the observability stack. Insights are drawn from operating platforms that handle terabytes of logs and millions of events per day.

Anton Pryhodko

(EPAM, Systems Architect),
Highload fwdays'25 conference
Sign in
Or by mail
Sign in
Or by mail
Register with email
Register with email
Forgot password?