How we spent 72 hours chasing 5 seconds [recorded talk]
This talk demonstrates practical approaches to unified observability, where metrics, logs, traces, and profiles are integrated for rapid diagnostics in distributed systems. We will cover data correlation techniques using trace IDs and labels to enable instant navigation from errors to specific spans, setting up continuous profiling for preview environments, using flame charts for performance analysis, and leveraging dependency maps and service graphs to visualize architecture. Special attention is given to AI-specific aspects: applying AI assistants to automate root cause analysis and implementing AI Evals for systematic evaluation of the quality, correctness, and reliability of AI systems.
Denys Vasyliev
Principal Site Reliability Engineer / UK Global Talent Visa Holder
- 17+ years in the industry: from engineer to CTO
- Certified opensource contributor
- Speaker: Fwdays, Xpdays, DevOPSdays, DevOps-DEX London
- Author of Kubernetes DIY course & of a series of courses on AI Reliability Engineering
- Author and host of Telegram and YouTube channel "[in]correct DevOps"