You’re probably using Lighthouse wrong: How do we misuse the most common tool to measure web performance? [eng]

Talk presentation

These days, web performance is one of the most important things everyone wants to optimize on their apps, and it's clear to everyone how dramatic the impact of a poorly optimized website is on business. Yet we as an industry completely fail in recognizing its complexity and widely misuse the most common tool to measure it — Google Lighthouse. If you’re one of those people thinking that good performance equals a good Lighthouse score, you’ve also fallen into this trap and this talk is for you. You will learn when the Google Lighthouse audit results can trick you and how to make good decisions based on its output.

Filip Rakowski

Vue Storefront

Co-creator and CTO of the biggest Open Source e-commerce project - Vue Storefront
An active member and official partner of Vue.js community, international speaker, and trainer
In his free time, he writes on Vue School blog about performance and vue best practices
Twitter

Report transcription

Hello. My name is Philip Rakovsky. I am co-founder and chief developer experience officer at View Storefront. And I'm also a technology council member of Maha Alliance. The Maha Alliance is a non-profit advocacy group that is advocating for open and best of breed enterprise technologies. And we have members like Netlify, AWS, Algolia. And the goal of Maha Alliance is to push the e-commerce industry forward. For those of you who don't know what View Storefront is, View Storefront is a solution, a quite complex problem of building e-commerce storefronts. It's much harder than it seems. And you obviously feel very powerful after displaying data from the first API endpoint. But from there, production, it's a very long way. View Storefront is a set of open source tools for developers that just make it simpler. It's fully open source. So you can just check it on GitHub.

And, you know, I was at this conference in 2019. And I really fell in love with people, with the city. It was a really nice experience. And honestly, it's terrible that we can't all meet together and have fun again. Just like it was back then. I have absolutely enormous respect for Ukrainians fighting for their freedom. You are the real heroes for the whole Western world. And the Russian war against Ukraine is, I think, something that just shouldn't happen. And, you know, I hope this pointless conflict made just to fulfill the ambition of a madman will end as soon as possible.

But despite my enormous respect for you and your country, I know you're here to listen about something else. So as it is usually with me, I will talk about performance. Or to be more precise, measuring performance. Because these days, web performance is one of the most important things everyone wants to optimize on their apps. And it's quite clear to everyone how big the impact of well-optimized websites is on business metrics. By the way, if you need a good source of arguments for your boss, for example, to take care of performance, you can check out this website, WPO stats.

And you may not believe me, but five years ago, when we were writing the first lines of code for VStoreFront, the topic of front-end performance was almost non-existent in the web dev space. At that time, the JavaScript SPA frameworks, they were getting huge traction, really. AngularJS and React, they were already well-established tools. Even you just released the second major version of VJS. And only a few years later, it became one of the most popular SPA frameworks in the world. And the thing is, at that time, almost no one cared how fast the websites built with those technologies are. Not everyone. Now, everyone says that putting so much JavaScript on the front-end was a bad idea. But it wasn't as clear as now at that time, because honestly, it's mostly due to the ecosystem.

And as long as we were using PCs and laptops as primary machines to consume the web, no one seemed to be concerned with the growing size of websites. Both CPU and internet bandwidth, they were progressing faster than websites were growing their size. This all changed when mobile started to become the preferred way of consuming the web. And according to Google research in 2017, and if you compare that with the current chart, it's much worse right now. So in 2017, it took on average 15 seconds to fully load a web page on a mobile phone. 15 seconds. Would you wait that long? I don't think so. So at that time, the awareness about the impact of poor mobile performance on the business started to emerge. Thanks to companies like Google, actually. But we're still lacking an easy way to link those two components. So performance and business.

And everything changed when Google Lighthouse started gaining popularity. I remember when it was introduced and became really rapidly adopted in e-commerce space following progressive web apps hype around 2018. Everyone was obsessed about progressive web apps. Everyone was obsessed about web performance. And almost no one knew anything about both. Unfortunately, not much has changed since then. So what makes Lighthouse so widely adopted is its simplicity. You run a test and get a number between 1 and 100. That tells you how good or how bad the performance of your website is. Everyone, even those without technical background, can understand that. And to be honest, that's the root of the problem.

Because the reality is not that simple. The web performance or user experience cannot be measured as a single number. In addition, there are a lot of nuances around how a Lighthouse audit works. And to use it as a reliable source of knowledge, you really have to be aware of those nuances. You won't read about those things in the audit summary. They recently added some information, but still, they are not complete. But don't worry. We will navigate through all the nuances. And at the end of this talk, you will know exactly where things are coming from and when not to trust them, which is even more important. So let's start with a simple question.

What does Google Lighthouse really measure? And how can we use this information? Because I have a feeling that many people don't try to answer it and just blindly assume that the score has to be high to be right. And that's all they need to know. So as we can read on the Lighthouse website, the goal of Google Lighthouse is to measure page quality. The audit divides the quality into four categories: performance, accessibility, best practices, and SEO. All of those combined to give you a good perspective on the quality of the website. By that, we try to accurately predict the real-world user experience. "Try to predict" is accurate here because the audit will not give you any definitive answers about your user's experience.

Google has always promoted this performance score as the most important one. In the heads of the general audience, the Lighthouse score equals the performance score. So a quality page means a page with a high performance score. And don't get me wrong here. I think performance is definitely a major factor influencing page quality and user experience. But the fact that we got so obsessed with just one of the four metrics gives the false impression that the only thing that matters to the end user is performance. And if we get it right, our users will have a delightful experience. In reality, there are so many factors influencing good user experience that even all four Lighthouse metrics can't give you a definitive answer if it is good or if it is bad.

I assure you that the best way to start improving the experience is just talking to your users and learning how they use your app and what they are struggling with. And if you want to use the data from Lighthouse in your decision-making process, you need to put it in the right context. Without knowing the context, it's very easy to make bad decisions that logically could seem correct, but they are not. BMI, Body Mass Index, I think is a great analogy to the Lighthouse score. So does a 30 BMI mean that you're obese? Well, if you look at the chart, we can say with full confidence, yes, you are. But when we dig into the details of how the results should be interpreted, we learn that this scale doesn't work for a very large amount of people.

Older adults, women, muscular individuals, the interpretation for them is different. But this list doesn't end here. The interpretation for children and adolescents is also different. So the only group that comes to mind that actually fits into the BMI algorithm are non-muscular middle-aged males. So the initial results, if you only look at the scale, they could lead you to bad decisions if you don't dig into those specific details. This kind of thinking could lead to real disasters. For example, if you look at this chart, we can quickly jump into the conclusion that we can put an end to horrible things just by banning worldwide cheese consumption. You don't want that to happen, right? Making business decisions based on a raw number without broader context usually leads to bad decisions. But you can make even worse ones if you don't even know where this number comes from.

So let me quickly explain how the Lighthouse score is calculated to make sure we are all on the same page. The Lighthouse score is calculated from a bunch of other metric scores, and each of those metrics has its own weight. Some of them are more important, some of them are less important. The algorithm changes with each version, and it is super, super important information to acknowledge. This is nothing unusual, honestly because the more Google explores the impact of individual metrics on the user experience, the more accurate the weights are. But this is why it makes no sense to compare your current Lighthouse score to the one from a year or two ago. Most likely, the scoring system has changed during this time, and it could have improved or decreased because of the algorithm change, not because you changed something. I've seen many people who made that mistake and were convinced that their website was slower than before, even though it wasn't.

So always look at specific metrics, those low-level metrics that are measured, not the Lighthouse score because this is an algorithm that is changing. Let's dig deeper into understanding where this magical performance score comes from. We already know half of the truth behind this number, which is what metrics contribute to this, but we still don't know about the second most important thing: the environment the test is performing. And it matters a lot. Most people use Chrome DevTools to run their Lighthouse scores. And this is probably the least reliable way of doing that. There are multiple external factors that influence the score. The download and rendering speed depend on your CPU and network. Developers will usually run the test on their shiny MacBooks with 5G internet, in most cases, have better results than real users.

In addition, your browser extensions are also treated as part of the website you're auditing. So we can decrease the impact of these factors by running the test in incognito mode, which will exclude extensions, and by applying ctunetwork.rotflink, but we'll still see different results on different devices. And honestly, running Lighthouse locally is not reliable if you want to compare the results with anyone else. It's definitely not reliable if you want to tell if the supposed end-user experience will be good or will be bad.

Honestly, everyone in the company can run the test, and I can guarantee you everyone will have different results because the tests were performed in a different environment. You should always test your website in the same environment and limit the external factors to an absolute minimum. Otherwise, it's just not a reliable test. So you can have more consistent results if you set up a Lighthouse CI in an external environment. Lighthouse CI is, well, Lighthouse running on your CI tests on some external machine. Or you can use some external tools like SpeedCurve. But if you need to quickly inspect the website, I suggest taking a look at Pagespeed Insights.

Pagespeed Insights is a website made by Google itself, and you can use it to perform quick Lighthouse tests on basically any website remotely on a Google data center instead of your local machine. So PSI, Pagespeed Insights, usually uses the data center that is the closest one to your location. Sometimes it could use another one if the closest one is under heavy load. This is why you could sometimes get slightly different results on the same page in tests made one after another, though recently they started caching the results. So if you really want to run multiple tests with different results, then you have to run another one in incognito after quite some time. Even though the PSI score will be more consistent, it's still far from being an accurate measurement of a real user experience.

Pagespeed Insights will run a mobile Lighthouse test on emulated Motorola from a mid-range. So unless all of your users have it, you're going to experience the website differently. The good news is Pagespeed Insights will also tell you how your website performs in the real world. So at the very top of every audit, you will see Core Web Vitals that has to be green to positively impact your SEO results, and three other important performance metrics that are collected from your users but don't affect SEO. We'll explore that in depth soon. For now, just remember that PSI gives you two types of results. The real-world result and the synthetic result, which is a result of running it in the Google data center.

Another problem with Lighthouse is the fact that it is just an algorithm. It gets input, processes the data, gives output. If we know how it works, we can easily cheat it, and that happens more often than you think. You can easily find a lot of articles like this one showing how to build a website that is getting a perfect Lighthouse score in a particular category while delivering a horrible experience. It's equally easy to trick performance score. You can detect the Lighthouse user agent and serve a different version of your website for auditing tools. For example, you can quickly remove all script tags from your server-side rendered application and voila! Performance improved to almost 100 because JavaScript is usually the bottleneck.

In fact, there are companies out there doing exactly that, and there's a lot of them, trust me. So if the score is suspiciously high compared to the website complexity and reloading experience, there is a high chance that someone is trying to trick you. After hearing my presentation, of course, you can get the impression that, in my opinion, Lighthouse is useless, and this is definitely not my point here. I think it's a wonderful tool, and the fact that Google is trying to find an easy way to help developers identify potential performance bottlenecks impacting user experience, it is definitely worth supporting. The problem is not the tool itself, but the problem is the way how we use it, or to be more precise, how we misuse it. And, you know, I have no doubt that Google Lighthouse has contributed to the faster web more than any other tool, but the name Lighthouse has a purpose. Its goal is to guide on improving the page quality, not give a definitive answer if it's good or bad. I've seen websites with a great user experience and low Lighthouse scores, and vice versa. Good performance is a trade-off. You always have to sacrifice something to make it better.

Sometimes it's an analytic script, sometimes it's a feature. It's not always a good business decision to get rid of some of them to have better performance. You have to weigh it. To me, Lighthouse shines the most when we want to quickly compare different versions of our websites to see if there was an improvement or maybe a decrease. It's definitely worth implementing Lighthouse checks in your CI/CD pipelines with Lighthouse CI. You should also audit websites with similar complexity from your industry to get a sense of the realistic score in your case that you should be aiming for.

Because an e-commerce website is rarely scoring above 60, while a blog often hits 100. So it's important to know what is a good score in your case. Lighthouse is not a good tool to measure actual experience. Synthetic data, so the one coming from Lighthouse, it will never tell you anything about that. Just to understand your users, you need to collect data from your users. And you don't have to set up any additional monitoring tools to check that. If you audit your page on PageSpeed Insights, at the very top, you'll see how it is scoring against the four most important performance metrics. And that will also affect your SEO results.

So the data is collected from the past 30 days on devices using Chrome among real users of your website. So keep that in mind when comparing the score before and after the update. The only requirement for collecting this data, really, is to make your website crawlable. So it basically comes out of the box. This real-world data comes from something that is called the Chrome User Experience Report, which collects performance metrics from real user devices using Google Chrome.

So by default, when you're using Google Chrome, this data is sent to Chrome User Experience dashboard. You can turn it off in the settings, but why would you, right? And you can easily get access to the history of your metrics in the CrUX dashboard in Google Data Studio. Now, I think we have a good perception of how Lighthouse works and how it could be tricked and when not to use it. But before I finish, I keep saying about the difference between so-called lab data that is coming from Lighthouse, measured in a specific environment and real-world data of your users.

And it's important to not optimize the former. And here is why. Here's an example. So if you do a PageSpeed Insight benchmark on a few websites, you will quickly notice that in many cases there is no relation between a Lighthouse score and the performance data from the users. For example, here you can see a screenshot from an audit of Adidas website. So even though the Lighthouse score is rather high, 81 for a newcomer. Yeah, it is really high. Then the real-world performance is terrible. Why? Maybe because they are cheating. Maybe because your users have worse devices and they have to optimize for it. You always have to know your users, really. That's the only solution. Not look at the Lighthouse score on its own. It's a good indicator, but nothing else. So thank you so much. I hope you enjoyed this talk. If you would like to learn a little bit about things that I'm working on, I'm writing about them on the user from developer newsletter. Also, if you care about performance, if you'd like to learn more about performance, you could also follow me on Twitter. Thanks so much for inviting me to this event and have fun. Bye-bye.

Buy tickets for the next conference React+ fwdays’25 Conference!

You’re probably using Lighthouse wrong: How do we misuse the most common tool to measure web performance? [eng]

Talk presentation

Report transcription