How we stopped building analytics directly in the code: CDC, BigQuery, and the new role of the Data Engineer [ukr]
In many systems, analytics are built directly into the backend: events, workers, and enrichment through dozens of database queries and calls to other services. In our case, a single analytics event generated up to 10 database queries, which, at a scale of millions of events, placed a significant load on production.
In this talk, I’ll explain how we completely changed our approach:
- we switched from application events to CDC via Debezium;
- we started feeding changes from each table directly into the data pipeline;
- we moved enrichment and aggregations to BigQuery;
As a result:
- we eliminated millions of read queries to the production database;
- reduced the complexity of the backend code;
- separated OLTP from analytics; and made building analytics significantly faster.
Let’s talk separately about a less obvious benefit:
now, to build new analytical scenarios, all you need is one Data Engineer, an ERD diagram, and modern AI tools—without involving the backend team and without changes to the production code.
We’ll also examine:
- where CDC actually delivers value, and where it doesn’t;
- what issues arise (lag, duplicates, schema changes);
- how the system’s cost changes;
Yozhef Hisem
Solution Architect @ MacPaw
- Solution Architect at MacPaw
- Speaker at Fwdays (PHP & Architecture Talks), DOU and YouTube channels
- A regular participant in the Intern MacPaw educational program: for 4 years in a row he has been helping to integrate beginners into real projects
- Shares experience in the field of architecture and testing, in particular using BDD, Symfony, Redis, Docker and modern API solutions
- You could see József on the stages of Fwdays, read on DOU or listen to interviews on the YouTube channel "It’s raining cats & dogs"
- GitHub, Medium, LinkedIn, Facebook