You’re probably using Lighthouse wrong: How do we misuse the most common tool to measure web performance? [eng]

Презентація доповіді

These days, web performance is one of the most important things everyone wants to optimize on their apps, and it's clear to everyone how dramatic the impact of a poorly optimized website is on business. Yet we as an industry completely fail in recognizing its complexity and widely misuse the most common tool to measure it — Google Lighthouse. If you’re one of those people thinking that good performance equals a good Lighthouse score, you’ve also fallen into this trap and this talk is for you. You will learn when the Google Lighthouse audit results can trick you and how to make good decisions based on its output.

Filip Rakowski

Vue Storefront

Co-creator and CTO of the biggest Open Source e-commerce project - Vue Storefront
An active member and official partner of Vue.js community, international speaker, and trainer
In his free time, he writes on Vue School blog about performance and vue best practices
Twitter

Транскрипція доповіді

Hello. My name is Philip Rakovsky. I am a co-founder and Chief Developer Experience Officer at View Storefront. I am also a Technology Council member of Maha Alliance. Maha Alliance is a non-profit advocacy group that advocates for open and best-of-breed enterprise technologies. Our members include companies like Netlify, AWS, and Algolia. The goal of Maha Alliance is to propel the e-commerce industry forward.

For those of you unfamiliar with View Storefront, it is a solution that addresses the complex problem of building e-commerce storefronts, which is much more challenging than it seems. Displaying data from the first API endpoint might make you feel powerful, but from there to production is a lengthy journey. View Storefront provides a set of open-source tools for developers to simplify this process. It is fully open source, and you can find it on GitHub.

Back in 2019, I attended a conference and fell in love with the people and the city. It was a wonderful experience, and it's unfortunate that we can't all meet again and have fun together. I have immense respect for Ukrainians fighting for their freedom, considering them the real heroes for the entire Western world. The Russian war against Ukraine is something that shouldn't happen, and I hope this pointless conflict, driven by the ambition of a madman, will end soon.

Despite my respect for you and your country, I know you're here to listen to something else. As is usually the case with me, I'll talk about performance, or more precisely, measuring performance. Web performance is currently one of the most critical aspects everyone wants to optimize for their apps. The impact of well-optimized websites on business metrics is clear to everyone. If you need compelling arguments for your boss to prioritize performance, you can check out this website, WPO stats.

Believe it or not, five years ago, when we were writing the first lines of code for VStoreFront, the topic of front-end performance was almost non-existent in the web development space. JavaScript SPA frameworks like AngularJS and React were gaining significant traction, and Vue.js had just released its second major version, becoming one of the most popular SPA frameworks globally a few years later. At that time, not many cared about how fast websites built with those technologies were. Now, everyone acknowledges that putting so much JavaScript on the front-end was a bad idea, but it wasn't as clear then, mainly due to the ecosystem.

As long as we used PCs and laptops as primary machines for web consumption, no one seemed concerned about the growing size of websites. Both CPU and internet bandwidth were progressing faster than websites were increasing in size. Everything changed when mobile became the preferred way of consuming the web. According to Google research in 2017, and considering the current chart, the situation has worsened. In 2017, it took an average of 15 seconds to fully load a web page on a mobile phone. Would you wait that long? I don't think so. At that time, awareness about the impact of poor mobile performance on business started to emerge, thanks to companies like Google. However, we still lack an easy way to link these two components: performance and business.

Everything changed when Google Lighthouse started gaining popularity. I remember its introduction and rapid adoption in the e-commerce space during the progressive web apps hype around 2018. Everyone was obsessed with progressive web apps and web performance, yet very few had a deep understanding of both. Unfortunately, not much has changed since then. What makes Lighthouse widely adopted is its simplicity: you run a test and get a number between 1 and 100, indicating how good or bad your website's performance is. This simplicity, however, is the root of the problem.

Reality is not that simple. Web performance and user experience cannot be measured as a single number. Additionally, there are numerous nuances around how a Lighthouse audit works, and to use it as a reliable source of knowledge, one must be aware of these nuances. You won't find all these details in the audit summary; although they recently added some information, it's still incomplete. But don't worry—we will navigate through all the nuances. By the end of this talk, you will understand exactly where things come from and when not to trust them, which is even more important. Let's start with a simple question.

What does Google Lighthouse really measure, and how can we use this information? Many people, I feel, don't try to answer this question and simply assume that the score has to be high to be right. That's all they need to know. According to the Lighthouse website, the goal of Google Lighthouse is to measure page quality, dividing it into four categories: performance, accessibility, best practices, and SEO. All these combine to provide a comprehensive view of the website's quality, aiming to predict real-world user experience accurately. However, it's essential to note that the audit won't provide definitive answers about your users' experience.

Google has always emphasized the performance score as the most crucial one. In the minds of the general audience, the Lighthouse score equals the performance score. While performance is undoubtedly a major factor influencing page quality and user experience, our obsession with just one of the four metrics creates a false impression that the only thing that matters to end users is performance. In reality, there are numerous factors influencing good user experience, and even all four Lighthouse metrics can't definitively determine if it's good or bad.

I assure you that the best way to start improving the experience is by talking to your users and learning how they use your app and what challenges they face. If you want to use Lighthouse data in your decision-making process, you need to put it in the right context. Making decisions without understanding the context can lead to logical but incorrect decisions. Body Mass Index (BMI) is a great analogy to the Lighthouse score. Does a BMI of 30 mean you're obese? Looking at the chart, you might confidently say yes, but understanding how the results should be interpreted reveals that this scale doesn't work for a large number of people.

Older adults, women, muscular individuals—all have different interpretations. The same goes for children and adolescents. The BMI algorithm fits a narrow group—non-muscular middle-aged males. Relying on raw numbers without broader context in business decisions often leads to bad outcomes. Making decisions without understanding where the number comes from can be even worse.

Let me quickly explain how the Lighthouse score is calculated to ensure we are all on the same page. The Lighthouse score is derived from a set of other metric scores, each with its own weight—some more important than others. The algorithm changes with each version, a crucial point to acknowledge. While it may seem unusual, Google continually refines the impact of individual metrics on user experience, resulting in more accurate weights. Therefore, it makes little sense to compare your current Lighthouse score to one from a year or two ago. The scoring system most likely changed during that time, potentially improving or decreasing due to algorithm adjustments, not necessarily because of changes you made. Many individuals have made this mistake, wrongly believing their website had become slower over time when, in fact, it was the algorithm that changed.

Always focus on specific metrics—those low-level measurements—rather than the Lighthouse score itself, given its evolving nature. Now, let's delve into understanding the origin of this magical performance score. We know part of the truth, which metrics contribute, but there's another crucial aspect: the testing environment. This factor matters significantly. Most people use Chrome DevTools to run their Lighthouse scores, a method that's arguably the least reliable. Numerous external factors influence the score, with download and rendering speed affected by CPU and network capabilities. Developers often run tests on high-end machines with fast internet, yielding better results than real users might experience.

Furthermore, browser extensions are treated as part of the audited website. While running the test in incognito mode excludes extensions and applying ctunetwork.rotflink can mitigate these factors, results may still vary across devices. Running Lighthouse locally is not reliable for comparing results with others or gauging the supposed end-user experience's quality. In a company setting, everyone running the test is likely to produce different results due to varying environments. To obtain consistent results, it's advisable to set up Lighthouse CI in an external environment. Lighthouse CI involves running Lighthouse on your CI tests on an external machine. Alternatively, you can utilize external tools like SpeedCurve. However, for a quick website inspection, I recommend checking out PageSpeed Insights.

Pagespeed Insights is a website created by Google itself, allowing you to perform quick Lighthouse tests on virtually any website remotely using a Google data center instead of your local machine. PSI (Pagespeed Insights) typically utilizes the data center closest to your location, occasionally switching to another if the closest one is under heavy load. This can lead to slightly different results on the same page when tests are conducted one after another, though recent caching improvements have made the scores more consistent. To obtain varied results, running another test in incognito mode after some time is necessary. Despite its more consistent scores, PSI still falls short of providing an entirely accurate measurement of real user experience.

Pagespeed Insights conducts a mobile Lighthouse test on an emulated Motorola device from a mid-range. Unless all your users have a similar device, their experience of the website will differ. However, PSI also provides insights into how your website performs in the real world. At the top of every audit, you'll find Core Web Vitals, crucial for positive SEO results, and three other important performance metrics collected from your users that don't directly impact SEO. We'll explore these metrics in-depth shortly. For now, remember that PSI offers two types of results: real-world results and synthetic results derived from running tests in Google data centers.

Another challenge with Lighthouse is that it's simply an algorithm—input in, process data, output out. Understanding its workings allows potential cheating, a more common occurrence than one might think. Articles demonstrate how to achieve a perfect Lighthouse score in a specific category while delivering a subpar experience. Manipulating the performance score is equally straightforward. Recognizing the Lighthouse user agent and serving a different version of your website for auditing tools, such as removing all script tags, can result in a near-perfect score, as JavaScript is often a bottleneck.

In fact, some companies employ such tactics, and they're more prevalent than you might imagine. A suspiciously high score compared to the website's complexity and reloading experience may indicate an attempt to deceive. While I've highlighted challenges, it's not my intention to dismiss Lighthouse as useless. It's a valuable tool, with Google aiming to assist developers in identifying potential performance bottlenecks affecting user experience. The issue lies not with the tool itself but in how we use it, or more precisely, how we misuse it. Google Lighthouse has undoubtedly contributed to a faster web more than any other tool. However, its name, Lighthouse, serves a purpose—it guides improvement in page quality rather than providing a definitive judgment of good or bad. Websites can have a great user experience with low Lighthouse scores, and vice versa. Achieving good performance is a trade-off; sacrifices must be made to enhance it.

Sometimes it's an analytic script; other times, it's a feature. Eliminating some elements for better performance may not always be a prudent business decision; it requires careful consideration. To me, Lighthouse is most effective for quickly comparing different versions of websites to assess improvements or potential regressions. Implementing Lighthouse checks in your CI/CD pipelines with Lighthouse CI is valuable. Additionally, auditing websites with similar complexity in your industry provides a realistic benchmark for the scores you should be aiming for.

Because an e-commerce website rarely scores above 60, while a blog often hits 100, it's important to know what a good score is in your specific case. Lighthouse is not a suitable tool to measure actual user experience. Synthetic data, such as that coming from Lighthouse, will never provide insights into user experience. To understand your users, you need to collect data from them. Fortunately, you don't have to set up additional monitoring tools to achieve that. If you audit your page on PageSpeed Insights, at the very top, you'll see how it scores against the four most important performance metrics, which also impact your SEO results.

The data is collected from the past 30 days on devices using Chrome among real users of your website. Keep this in mind when comparing scores before and after an update. The only requirement for collecting this data is to make your website crawlable, which comes essentially out of the box. This real-world data originates from something called the Chrome User Experience Report, collecting performance metrics from real user devices using Google Chrome.

By default, when you're using Google Chrome, this data is sent to the Chrome User Experience dashboard. While you can turn it off in the settings, there's generally little reason to do so. Access to the history of your metrics is easily obtained in the CrUX dashboard in Google Data Studio. Now, with a better understanding of how Lighthouse works, when it could be tricked, and when not to use it, let's address the crucial difference between so-called lab data from Lighthouse, measured in a specific environment, and real-world data from your users.

It's important not to optimize based solely on lab data, and here's why. Consider this example: If you perform a PageSpeed Insight benchmark on several websites, you'll quickly notice that, in many cases, there is no correlation between a Lighthouse score and the performance data from users. For instance, in the screenshot from an audit of the Adidas website, even though the Lighthouse score is rather high at 81 for a newcomer, the real-world performance is terrible. Why? It could be due to cheating or because users have lower-end devices, and the website is optimized accordingly. Understanding your users is paramount. Merely looking at the Lighthouse score is a good indicator but not sufficient. So, thank you so much. I hope you enjoyed this talk. If you'd like to learn more about what I'm working on, I write about them in the User From Developer newsletter. Additionally, if you're interested in performance, you can follow me on Twitter. Thanks for inviting me to this event, and have fun. Bye-bye.

Квитки на наступну конференцію Конференція React+ fwdays’25 вже у продажі!

You’re probably using Lighthouse wrong: How do we misuse the most common tool to measure web performance? [eng]

Презентація доповіді

Транскрипція доповіді