In many systems, analytics are built directly into the backend: events, workers, and enrichment through dozens of database queries and calls to other services. In our case, a single analytics event generated up to 10 database queries, which, at a scale of millions of events, placed a significant load on production. In this talk, I’ll explain how we completely changed our approach: - we switched from application events to CDC via Debezium; - we started feeding changes from each table directly into the data pipeline; - we moved enrichment and aggregations to BigQuery; and we effectively removed the analytics load from backend services. As a result: - we eliminated millions of read queries to the production database; - reduced the complexity of the backend code; - separated OLTP from analytics; and made building analytics significantly faster. Let’s talk separately about a less obvious benefit: now, to build new analytical scenarios, all you need is one Data Engineer, an ERD diagram, and modern AI tools—without involving the backend team and without changes to the production code. We’ll also examine: - where CDC actually delivers value, and where it doesn’t; - what issues arise (lag, duplicates, schema changes); - how the system’s cost changes; and why “the same data” in the new architecture isn’t free.
Yozhef Hisem
(Solution Architect @ MacPaw),
Oleksander Krakovetskyi
(СЕО at DevRain),