How Zalando's overall site speed improved by more than 25% in five months
We all know that providing a fast user experience is key. Still, it was somewhat a wake-up call for us last fall when we saw our aggregated loading time increasing; not because we had increased latency in our systems but simply because the share of mobile visits kept increasing. By now, over 75% of our traffic comes from mobile devices (nearly equally split between app and web). And customer expectations are rising, especially on mobile!
We took this wake-up call as an opportunity to explore the impact of site speed in more detail. Yes, at Zalando every millisecond of latency counts, but what’s the concrete impact of another 100 msec improvement? We analyzed the correlations of loading time and revenue per session across every step of the user journey and for every device. The pattern was very clear and consistent (even if somewhat different in size). Shorter loading times go hand in hand with higher revenue per session. An A/B test brought the final confirmation: 100 msec loading time improvement led to a 0.7% uplift in revenue per session.
At Zalando, we live our values by setting bold expectations and making them highly visible for everyone in the company. Our ambition was a 20% loading time improvement in the first half of 2018. We’re excited that our efforts paid off and we reached a 25% improvement on our overall loading time within five months. We’re obviously thrilled that this is noted in Google’s “Mobile Speed Leaderboards” study, which rated us as the fastest mobile site in fashion retail. We’d like to share how we achieved this.
Given Zalando’s size, with hundreds of engineering teams and a breakneck pace of development, some teams are entirely self-sufficient when it comes to managing their performance, while others embark on a crash program to eliminate bottlenecks. That’s where Mission Control comes in: targeted engagements with engineering teams and Zalando’s Site Reliability Engineering (SRE) program. Our site reliability engineers roll up their sleeves and apply their specialized experience to achieve immediate results, while providing the tools for self-management after the engagement ends.
Over the last few months, a special focus has been on the optimization of the render time and time to interact with our website. On almost every step of the user journey, the engineers reduced the time to interaction by decreasing the amount of code that has to be executed. This sounds obvious, but it is not always easy to implement due to the chosen technology.
Our engineers from the Search and Browse team started the optimization with profiling their front end components with the component-level profiling, which was introduced in 15.4.0, and was turned on by default in React 16. It shows rendering time (mount and update) of each component, and warns about possible performance bottlenecks like updates triggered in lifecycle methods. This was a killer feature for us. Even if it is only available on development build, the proportion of rendering times resembles the one of production build.
Combined with Chrome’s Performance Tab, it helped us to identify the bottlenecks.
On mobile and tablet, react-lazyload for product images were triggering two reflows. The Catalog page renders eight products on server-side and 76 products with client-side. The second reflow took a very long time because it calculates the layout of a big area on the screen for the newly rendered 76 products. We removed the lazyload and implemented Low Quality Image Placeholders (LQIP) instead to avoid reflow at all.
On desktop and tablet, react-virtualized for a product filter dropdown was triggering reflow. The product filter component does not show anything until it is clicked, but it was rendered to provide links for crawlers. We stopped rendering the hidden product filter component and removed the reflow. For crawlers, we prepared links generated with string concatenation outside of React components.
We identified libraries that are large in size but not very necessary for us and we used tree shaking to eliminate dead-code. Unfortunately some CommonJS libraries did not work well with tree shaking. In these cases, we removed the packages and chose a smaller alternative or wrote our own. Also, we found out that some internal libraries were bundling their dependencies into their bundles with webpack. This caused our bundle to have the same code multiple times because NPM’s deduping mechanism couldn’t find the duplication.
By applying this approach we reduced the overall size of our Header Fragment by 25% (36.6 KB -> 27.4 KB gzipped):
Header Fragment (before):
Header Fragment (after):
Because each byte counts, we also reduced the page site in total (amount of DOM elements, JSON data size e.g. props).
React client-side hydration needs the props that are used for server-side rendering. The props are typically embedded into HTML as JSON. In the JSON, we had some unnecessary properties in large arrays of objects that were passed through from backend APIs. Removing those unused properties reduced the page up to 17 KB gzipped.
As the Zalando website uses SVG for icons, part of reducing the page size was also the SVG optimization. The SVG Optimizer (SVGO) is a great tool for optimizing SVG images. We have already been using the tool for a while, but recently we noticed that we had forgotten to do decimal precision optimization. It specifies the precision of floating point coordinates. SVG images generated from graphic software usually have too precise numbers to render pixels. After the optimization we reduced the SVG size by about 50%.
The biggest learning we had from our optimizations efforts is:
Remove as many as possible of your dependencies, keep the amount of code as small as possible and your webpage will be fast (again). A small and fast webpage will make your customers happy and will result in more conversion.
Looking to the future, SRE is making a number of improvements to make it easier for Zalando’s hundreds of engineering teams to self-manage their performance. It starts with setting expectations by Service Level Objectives that are meaningful from the customer perspective. With expectations set, we measure our Service Level Indicators against those expectations and we dive deep to optimize bottlenecks; -- that’s where distributed tracing comes in. With expectations and deep instrumentation, we gain the ability to implement monthly error budgets to help engineering teams better achieve operational excellence. The journey continues...
Join our tech team at Zalando.