Fwdays. IT conferences, workshops, courses and training for developers and IT specialists.

Agent in the Loop: Architecture for Highload Data Pipeline Recovery [ukr]

A real-world-inspired architecture talk about embedding an AI agent into the operational workflow of a highload data pipeline. We walk through a cascade failure scenario: corrupted data enters the pipeline, Kafka queues get stuck, storage pressure grows, thousands of Kubernetes pods start failing and rescheduling, etcd degrades, and PostgreSQL becomes a secondary pressure point. Then we show how an agent built with AWS Bedrock AgentCore, LangChain, and MCP/Gateway could detect early signals, isolate corrupted messages, suggest human-approved fixes, protect cluster stability, and turn noisy telemetry into actionable recovery steps.

Kyrylo Dubovyk

(AI Solutions Architect at EPAM | Founder “Digital Brain”),

Maksym Borodin

(Systems Architect @ EPAM),

Highload fwdays'26 conference

High-load ≠ high-cost: how to optimize infrastructure without losing reliability [ukr]

In high-load systems, infrastructure costs often grow not because of the load itself, but because of inefficient architectural decisions: overprovisioning, excessive use of managed services, unnecessary data movement, incorrect SLA decisions, and the lack of a transparent cost model. At the conference, we will discuss how to approach cost optimization as a full-fledged architectural practice, rather than a one-time resource reduction exercise. We will also cover workload profile analysis, identifying real bottlenecks, building unit economics for infrastructure, traffic optimization, caching, CDN, observability, and controlled service degradation. Separately, we will look at the trade-off between performance, reliability, and cost: where maximum fault tolerance is truly required, where an eventually consistent approach is sufficient, and where managed solutions should be replaced with a simpler self-hosted architecture.

Optimization Highload Performance Observability Infrastructure

Ihor Zakutynskyi

(CTO, FORMA, Universe),

Highload fwdays'26 conference

From Logging Chaos to Controlled Pipelines [ukr]

This is a story about real pain and the maturation of a logging system. We’ll examine how the lack of standards breaks observability and why Kubernetes became the point of no return, forcing us to rethink our entire logging approach. We’ll walk through the requirements and architectural decisions that helped us regain control. I’ll share hands-on experience in building controlled, production-grade log pipelines - without magic and without “silver bullet” tools. This is an honest story from real production environments.

DevOps Kubernetes Observability Pipelines

Olexandr Shevchenko

(DevOps Engineer, ONSEO),

DevOps fwdays'26 conference

How we spent 72 hours chasing 5 seconds [recorded talk]

This talk demonstrates practical approaches to unified observability, where metrics, logs, traces, and profiles are integrated for rapid diagnostics in distributed systems. We will cover data correlation techniques using trace IDs and labels to enable instant navigation from errors to specific spans, setting up continuous profiling for preview environments, using flame charts for performance analysis, and leveraging dependency maps and service graphs to visualize architecture. Special attention is given to AI-specific aspects: applying AI assistants to automate root cause analysis and implementing AI Evals for systematic evaluation of the quality, correctness, and reliability of AI systems.

AI DevOps Observability Architecture

Denys Vasyliev

(Principal Site Reliability Engineer / UK Global Talent Visa Holder),

DevOps fwdays'26 conference

Feelings versus facts: why metrics are more important than intuition [ukr]

We are used to trusting our feelings: it seems that the processes are working and the product is of high quality. But feelings cannot be scaled. In this report, I will show how we moved from intuitive decisions to a system of metrics that measures the quality of products and processes in real time. How teams, with a “dashboard,” manage the development of their products in terms of quality. And most importantly, how technical metrics become understandable to the business, help to talk about risks in one language, and make decisions on a large scale.

Application Monitoring Observability Engineering processes Business Growth

Igor Drozd

(CTO at Silpo (E-commerce)),

CTO fwdays'25 conference

Validation and Observability of AI Agents [ukr]

I will talk about the validation and monitoring of AI agents using the example of a mobile application that interacts with a multi-agent system via OpenAPI. I will demonstrate practical approaches to testing agent logic, methods for collecting performance metrics, and setting up an observability system. I will share my experience in tracking agent behavior in real time, detecting anomalies, and ensuring the reliability of a multi-agent architecture in production.

AI Observability

Oleksandr Denisyuk

(СТО в Укрпошта),

Fwdays+DevRain AI

Observability with Elasticsearch: Best Practices for High-Load Platform [ukr]

The session covers architectural strategies that keep Elasticsearch stable under heavy load: proper index and shard design, ILM policies, persistent queues in Logstash, metrics downsampling, and self-monitoring of the observability stack. Insights are drawn from operating platforms that handle terabytes of logs and millions of events per day.

Highload Best practices Observability

Anton Pryhodko

(EPAM, Systems Architect),

Highload fwdays'25 conference

Introducing Distributed Tracing in a Large Software System

.NET Distributed Tracing Visibility Observability

Kostiantyn Sharovarskyi

(Jooble),

.NET fwdays'23 conference

Filter by tag