Loading Time Matters

by Shuhei Kagawa, Jeff Cybulski, David Martin Jones, Thorbjoern Gruda, Christoph Luetke Schelhowe - 11 Jun 2018

How Zalando's overall site speed improved by more than 25% in five months

We all know that providing a fast user experience is key. Still, it was somewhat a wake-up call for us last fall when we saw our aggregated loading time increasing; not because we had increased latency in our systems but simply because the share of mobile visits kept increasing. By now, over 75% of our traffic comes from mobile devices (nearly equally split between app and web). And customer expectations are rising, especially on mobile!

We took this wake-up call as an opportunity to explore the impact of site speed in more detail. Yes, at Zalando every millisecond of latency counts, but what’s the concrete impact of another 100 msec improvement? We analyzed the correlations of loading time and revenue per session across every step of the user journey and for every device. The pattern was very clear and consistent (even if somewhat different in size). Shorter loading times go hand in hand with higher revenue per session. An A/B test brought the final confirmation: 100 msec loading time improvement led to a 0.7% uplift in revenue per session.  

At Zalando, we live our values by setting bold expectations and making them highly visible for everyone in the company. Our ambition was a 20% loading time improvement in the first half of 2018. We’re excited that our efforts paid off and we reached a 25% improvement on our overall loading time within five months. We’re obviously thrilled that this is noted in Google’s “Mobile Speed Leaderboards” study, which rated us as the fastest mobile site in fashion retail. We’d like to share how we achieved this.

Given Zalando’s size, with hundreds of engineering teams and a breakneck pace of development, some teams are entirely self-sufficient when it comes to managing their performance, while others embark on a crash program to eliminate bottlenecks. That’s where Mission Control comes in: targeted engagements with engineering teams and Zalando’s Site Reliability Engineering (SRE) program. Our site reliability engineers roll up their sleeves and apply their specialized experience to achieve immediate results, while providing the tools for self-management after the engagement ends.

Over the last few months, a special focus has been on the optimization of the render time and time to interact with our website. On almost every step of the user journey, the engineers reduced the time to interaction by decreasing the amount of code that has to be executed. This sounds obvious, but it is not always easy to implement due to the chosen technology.

We identified an older React version as one of the reasons for a slow loading time. So our platform team updated the React version that we use from 15.6.1 to 16.2.0. This update was solely responsible for improving the JavaScript execution time by over 100 milliseconds.

Our engineers from the Search and Browse team started the optimization with profiling their front end components with the component-level profiling, which was introduced in 15.4.0, and was turned on by default in React 16. It shows rendering time (mount and update) of each component, and warns about possible performance bottlenecks like updates triggered in lifecycle methods. This was a killer feature for us. Even if it is only available on development build, the proportion of rendering times resembles the one of production build.

Combined with Chrome’s Performance Tab, it helped us to identify the bottlenecks.

When we looked into profiling results, it was clear that reflows were the biggest bottlenecks. The purple boxes are reflows in JavaScript execution on production.

On mobile and tablet, react-lazyload for product images were triggering two reflows. The Catalog page renders eight products on server-side and 76 products with client-side. The second reflow took a very long time because it calculates the layout of a big area on the screen for the newly rendered 76 products. We removed the lazyload and implemented Low Quality Image Placeholders (LQIP) instead to avoid reflow at all.

Before (Mobile):

After (Mobile):

On desktop and tablet, react-virtualized for a product filter dropdown was triggering reflow. The product filter component does not show anything until it is clicked, but it was rendered to provide links for crawlers. We stopped rendering the hidden product filter component and removed the reflow. For crawlers, we prepared links generated with string concatenation outside of React components.

Before (Desktop):

After (Desktop):

As a result, we managed to reduce JavaScript execution time of Catalog by about 200 milliseconds on desktop and about 300 milliseconds on mobile devices at 90 percentile.



Another optimisation we did, was reducing the bundle size. Only sending code that is necessary helped to optimize the performance significantly. In the end each byte counts as JavaScript is expensive for the browser to process. Also, surprisingly many visitors don’t have cache (needs data), so it’s important to keep JavaScript bundles as small as possible. To identify where we have to look and where potentially the best results can be achieved, we used the webpack-bundle-analyzer.

We identified libraries that are large in size but not very necessary for us and we used tree shaking to eliminate dead-code. Unfortunately some CommonJS libraries did not work well with tree shaking. In these cases, we removed the packages and chose a smaller alternative or wrote our own. Also, we found out that some internal libraries were bundling their dependencies into their bundles with webpack. This caused our bundle to have the same code multiple times because NPM’s deduping mechanism couldn’t find the duplication.

By applying this approach we reduced the overall size of our Header Fragment by 25% (36.6 KB -> 27.4 KB gzipped):

Header Fragment (before):

Header Fragment (after):

Because each byte counts, we also reduced the page site in total (amount of DOM elements, JSON data size e.g. props).

React client-side hydration needs the props that are used for server-side rendering. The props are typically embedded into HTML as JSON. In the JSON, we had some unnecessary properties in large arrays of objects that were passed through from backend APIs. Removing those unused properties reduced the page up to 17 KB gzipped.

As the Zalando website uses SVG for icons, part of reducing the page size was also the SVG optimization. The SVG Optimizer (SVGO) is a great tool for optimizing SVG images. We have already been using the tool for a while, but recently we noticed that we had forgotten to do decimal precision optimization. It specifies the precision of floating point coordinates. SVG images generated from graphic software usually have too precise numbers to render pixels. After the optimization we reduced the SVG size by about 50%.

The biggest learning we had from our optimizations efforts is:

Remove as many as possible of your dependencies, keep the amount of code as small as possible and your webpage will be fast (again). A small and fast webpage will make your customers happy and will result in more conversion.

Looking to the future, SRE is making a number of improvements to make it easier for Zalando’s hundreds of engineering teams to self-manage their performance. It starts with setting expectations by Service Level Objectives that are meaningful from the customer perspective. With expectations set, we measure our Service Level Indicators against those expectations and we dive deep to optimize bottlenecks; -- that’s where distributed tracing comes in. With expectations and deep instrumentation, we gain the ability to implement monthly error budgets to help engineering teams better achieve operational excellence. The journey continues...

Join our tech team at Zalando.

Similar blog posts