**Exploring the Zalando Assortment by Browsing a Product Similarity Graph**

### Introduction

As Europe's leading online fashion and lifestyle platform, Zalando is continually developing new features to enable our customers to find the products they want. While the standard tools of Search, Categorization & Attribute Filtering are par-for-the-course for purchasing items online, with an ever-expanding fashion assortment and an increase in the data available to describe a product, this browsing experience is becoming more cumbersome and time-consuming, particularly on mobile devices.

At Zalando's Fashion Insights Centre in Dublin, while keeping a focus on developing AI and Big Data driven products and features in the medium term, we sometimes have time to explore new ideas with a longer-term vision. Either through our annual *Hackweek* (an internal week long hackathon) or our *Slingshot* programme (an “intrapreneurship” program fostered by Zalando's Innovation Lab). In this blog post we will share with you a project that has journeyed through, and benefited from, both programmes, and present a new method for browsing an online fashion catalog using a *Product Similarity Graph*.

### Product Similarity

What do we mean when we say that two products are similar? Do we mean that the products are from the same fashion trend, that they appear visually similar, or that they have a number of attributes in common, for example brand or product type? In fact it can be all of these things, or a select few, summed up to create an overall similarity score between two products, using all the data that makes sense for the task at hand.

When looking at the Zalando catalog in total, what does product similarity mean here? Well, it means calculating the previously described similarity score for each product against all others, (sometimes referred to as a similarity self-join) and storing the similarity scores in a suitable way. Typically, the scores are represented in a matrix format, which leads to the construction of a *Product Similarity Matrix*, where each row contains the similarity scores for one product against all others in the catalog, likewise for the columns since the matrix is *symmetric*.

Many of you will notice a potential pitfall as the catalog grows, which is, as the number of products, *n*, increases, the number of scores required to be generated will increase by *n* for every new product added to the catalog, and as such, the algorithmic complexity of generating a Product Similarity Matrix is *n*-squared, i.e., *O(n**2)*. Depending on the use case the catalog size could be in the millions. To make this problem manageable we use distributed systems and algorithms such as *Locality Sensitive Hashing*. However, we will spare the details for our purposes here, for now just consider that our Product Similarity Matrix is big. We use the Product Similarity Matrix within Zalando to provide a Product Similarity Service, which is currently used by our recommendation team to tackle the cold-start problem associated with Collaborative Filtering.

### Product Similarity Graph

The Product Similarity Matrix, due to its construction, can be easily interpreted as an *Adjacency Matrix*, which in turn can be interpreted as a *Network Graph*, where products in the catalog are represented as nodes, and the similarity relationship between products is represented by connections between nodes. Network graphs are interesting mathematical objects and appear in a number of different areas of Computer Science, such as the study of online social networks (the social graph) and the study of internet traffic (communications graph). Here we are interested in the relationships between products in a fashion catalog, and we use a network graph to organise and store Zalando’s product data.

Below, we present a visualization of a Product Similarity Graph for a small number of the products sold by Zalando. As is typical for graph visualizations we can see a rich structure emerge, where clusters of very similar products form. These clusters correspond to high-level product attributes such as product type, with similar products being close together, e.g., low shoes are clustered close to boots but far away from trousers. Within clusters, other more detailed low-level attributes such as color, materials and styles create a distinction between products. Other connections between clusters are far apart, which indicate a weak similarity relationship between products. However, these connections offer an opportunity to browse and explore other parts of the graph and hence other items in the catalog, offering both a *hunting* and *exploring* mode of product discovery. Finally, it important to note that the graph is not fully connected, since many product pairs will exhibit no or low similarity.

*Figure 1: A visualization of the Product Similarity Graph for a small selection of items from the Zalando Catalog. *

### Browsing via Graph Traversal

From looking at the visualization above it is easy to imagine traversing through the graph to find similar products of interest by choosing appropriate connections, quickly looking at one product then another until you find something you would like to buy. In a similar way as you might do in a bricks and mortar fashion store, where you start browsing in the trousers section, purchase an item, and decide to buy a matching shirt in another part of the store. However, how would we implement this feature? How do we enable a customer to discover products by browsing a network graph?

Network Graphs have been studied for many years and there exists many different algorithms to extract information from a graph, including algorithms to analyse the structure of a graph and algorithms to determine the optimal path through a graph between two nodes. However, for our use case where we would like to use a graph to drive a browsing experience – a scenario that has no predefined terminating node – there has not been much previous work. With this in mind, we present here a new *Graph Exploration Algorithm* called *Graph Browser* to enable browsing on a Product Similarity Graph, and provide a solution to the technical issues with browsing & exploring a graph in general.

Introducing the *Graph Browser* Algorithm

The Graph Browser algorithm enables browsing on a graph by generating a unique set of *Navigation Tags* for a product of interest on the graph, which we call the anchor product. The tags are generated directly from the product data neighbouring the anchor product and describe product attributes such as color, product type, brand etc. Furthermore, the tags indicate *attribute differences* between the anchor product and its neighbours in the graph, and allow the user to browse to a neighbouring product in the graph by simply selecting a tag. Common product attributes, such as color, may reference many neighbouring products, resulting in the tag referencing many possible products, while other tags may reference a single product. For the later case, the new product becomes the anchor product, for the former, the user chooses a single product to be the new anchor. Once a new anchor product is selected a new set of navigation tags are generated, and the process is repeated until the customer finds a product they would like to buy or exits the process. In this way, the browsing experience is only ever concerned with products that are in the direct neighbourhood of the anchor product, and the graph is traversed one connection at a time, where each product visited in turn is similar to the previous but differing by the selected tag. To explain further we provide a diagram of the process in figure 2. below.

Figure 2: A diagram of a Product Similarity Graph and its Navigation Tags (presented in curly braces) as generated by the Graph Browser Algorithm. Starting at P1, “black nike textile shoe,” one possible path to P3, “white fila leather shoe,” would be to select the following navigation tags in succession: “white’ & ‘fila,” where selecting “white” will assign P2 the new anchor, and selecting “fila” will move the anchor to P3. Alternatively, selection of the “white” and “leather” tags would also bring the customer to P3, based on their preference for a leather shoe, over brand preference.

To explain more formally, we present some details on the Graph Browser algorithm below, but first define some preliminaries: A Product Similarity Graph is constructed of product nodes, P = {p1,...,pn}, and connections between nodes that represent the similarity score between two products pi & pj, sij = sim_score(pi, pj). Each node has a record of all the product attributes for that product, pi = {a1,...,am}, which are used to generate the navigation tags. The algorithm is as follows,

- A single node in the graph is selected as the anchor node, pi.
- A set of all connected nodes to pi is constructed, Pcon = {p1,...,pk}.
- A mapping is constructed of the attribute differences between the anchor node, pi, and the attributes of the products contained in the set of connected nodes, M = diff_attrs(pi, Pcon), where M[pj] returns the set of attributes that differ, Dij = {a1,...,ad}, between products pj & pi.
- We construct a mapping of single attributes, or tags, to the products they reference by inverting our attribute difference mapping, and indexing attributes individually to the products they belong to, Q = tag_map(M), where products in Pcon that have common differing attributes are indexed by the same tag,
*i.e.*, the mapping Q[ai] returns the connected product, or products, that the tag references in the graph. - The set of navigation tags, T = {t1,...,tp}, for the anchor, pi, corresponds to the keys of the mapping Q. The user selects a tag, ts, from T and the new anchor product is selected from the indexed set of products, Q[ts], where pi now becomes Q[ts] if there is a single product indexed, or is selected by the user if there are more than one option.
- The algorithm returns to step 1 and the process repeats.

which completes the description of the Graph Browser algorithm. We present a small Python code implementation of the Graph Browser algorithm at the end of this article.

## Other Considerations

You will notice above that the initialization of the anchor node, pi, is not defined in the algorithm, however choices include: Random initialization, user-specific recommendations and search query initialization. Furthermore, we do not discuss how navigation tags are presented to the user, which can be presented using a ranking function that optimizes the positioning of the tags, for example by using the similarity scores themselves or using a customer’s preferences. Moreover, for the use case we present here we are interested in differences between products, however we could also generate tags that represent common product attributes.

## Finally

The Product Similarity Graph, Graph Browser algorithm and the Navigation Tags it generates all combine to produce a quick and easy way to browse an online product catalog. While we are only beginning to explore the possibilities of the Product Similarity Graph within Zalando, we are hopeful that it will be be used to drive some of our product discovery and tag-based navigation features in the future.

### Appendix: Python Graph Browser Implementation

import networkx as nx

# Define product records

products = [

# ID, Brand, Type, Material, Color

['P1', 'nike', 'shoe', 'textile', 'black'],

['P2', 'nike', 'shoe', 'textile', 'white'],

['P3', 'fila', 'shoe', 'leather', 'white'],

['P4', 'fila', 'sock', 'synth', 'blue']

]

# Use jaccard index as similarity score

def jaccard_index(x, y):

"""Returns the jaccard index between `x` & `y`.

"""

x = set(x); y = set(y)

return len(x.intersection(y)) / len(x.union(y))

def product_similarity_matrix(products, sim_score=jaccard_index):

"""Returns a Product Similarity Matrix for specified`products`

&

similarity score,`sim_score`

.

"""

prod_sim_mat = {}

# n(n-1)/2 scores

for i, product_i in enumerate(products):

for product_j in products[:i]:

idi, *attr_i = product_i

idj, *attr_j = product_j

prod_sim_mat[(idi, idj)] = sim_score(attr_i, attr_j)

return prod_sim_mat

def product_similarity_graph(prod_sim_mat, products):

"""Combine Product Similarity Matrix and`products`

to construct a

Product Similarity Graph.

"""

# Create networkx graph

PSG = nx.DiGraph()

# Add nodes and attrs to graph

for product in products:

id_, *attrs = product

PSG.add_node(id_, attrs=attrs)

# Add edges and scores to nodes

for ind, score in prod_sim_mat.items():

if score > 0:

start, end = ind

PSG.add_edge(start, end, score=score)

PSG.add_edge(end, start, score=score)

return PSG

def generate_diff_attrs(prod_sim_graph):

"""Generate a set of attribute differences between each pair of connected

nodes in`prod_sim_graph`

and add to edges.

"""

for anchor, neighbour in prod_sim_graph.edges():

anchor_attrs = set(prod_sim_graph.node[anchor]['attrs'])

neighbour_attrs = set(prod_sim_graph.node[neighbour]['attrs'])

prod_sim_graph.edge[anchor][neighbour]['diff_tags'] = \

neighbour_attrs - anchor_attrs

return prod_sim_graph

def generate_nav_tags(prod_sim_graph):

"""Generate a navigation tag map for each node of the`prod_sim_graph`

.

"""

for anchor in prod_sim_graph.nodes():

tag_map = {}

for neighbour in prod_sim_graph.neighbors(anchor):

for tag in prod_sim_graph[anchor][neighbour]['diff_tags']:

tag_map[tag] = tag_map.get(tag, []) + [neighbour]

prod_sim_graph.node[anchor]['nav_tags'] = tag_map

return prod_sim_graph

if __name__ == "__main__":

# Construct Product Similarity Matrix

prod_sim_mat = product_similarity_matrix(products)

# Construct Product Similarity Graph

PSG = product_similarity_graph(prod_sim_mat, products)

# Generate difference attributes and attach to edges

PSG = generate_diff_attrs(PSG)

# Generate navigation tags and attach to nodes

PSG = generate_nav_tags(PSG)

### Browse the PSG using simulated user inputs ###

# Setup simulated user inputs

anchor = 'P1'

tag_selections = ['white', 'fila', 'sock']

product_selections = ['P2']

# Run through simulated inputs

for selection in tag_selections:

print("Anchor: {}".format(anchor))

print("Tag selection: {}".format(selection))

# Graph Browser

products = PSG.node[anchor]['nav_tags'][selection]

if len(products)>1:

product_selection = product_selections.pop(0)

assert product_selection in products, "Bad selection"

anchor = product_selection

print("Product selection: {}".format(product_selection))

else:

anchor, = products

print("Bought product: {}".format(anchor))

print("\n\tFin.\n")