How we stopped building analytics directly in the code: CDC, BigQuery, and the new role of the Data Engineer [ukr]

Wanna listen

Talk presentation

In many systems, analytics are built directly into the backend: events, workers, and enrichment through dozens of database queries and calls to other services. In our case, a single analytics event generated up to 10 database queries, which, at a scale of millions of events, placed a significant load on production.

In this talk, I’ll explain how we completely changed our approach:

we switched from application events to CDC via Debezium;
we started feeding changes from each table directly into the data pipeline;
we moved enrichment and aggregations to BigQuery;

and we effectively removed the analytics load from backend services.

As a result:

we eliminated millions of read queries to the production database;
reduced the complexity of the backend code;
separated OLTP from analytics;

Let’s talk separately about a less obvious benefit:
now, to build new analytical scenarios, all you need is one Data Engineer, an ERD diagram, and modern AI tools—without involving the backend team and without changes to the production code.

We’ll also examine:

where CDC actually delivers value, and where it doesn’t;
what issues arise (lag, duplicates, schema changes);
how the system’s cost changes;

and why “the same data” in the new architecture isn’t free.

Yozhef Hisem

Solution Architect @ MacPaw

Solution Architect at MacPaw
Speaker at Fwdays (PHP & Architecture Talks), DOU and YouTube channels
A regular participant in the Intern MacPaw educational program: for 4 years in a row he has been helping to integrate beginners into real projects
Shares experience in the field of architecture and testing, in particular using BDD, Symfony, Redis, Docker and modern API solutions
You could see József on the stages of Fwdays, read on DOU or listen to interviews on the YouTube channel "It’s raining cats & dogs"
GitHub, Medium, LinkedIn, Facebook