Distributed Cache

by Rohit Sharma - 19 Apr 2018

Using Akka cluster-sharding and Akka HTTP on Kubernetes

This article captures the implementation of an application serving data over HTTP which is stored in cluster-sharded actors and deployed on Kubernetes.

Use case: An application, serving data over HTTP and with a high request rate, and the latency of order of 10ms with limited database IOPS available.

My initial idea was to cache it in memory, which worked pretty well for some time. But this meant larger instances due to duplication of cached data in the instances behind the load balancer. As an alternative I wanted to use Kubernetes for this problem and do a proof of concept (PoC) of a distributed cache with Akka cluster-sharding and Akka-HTTP on Kubernetes.

This article is by no means a complete tutorial to Akka cluster sharding or Kubernetes. It outlines knowledge I gained while doing this PoC. The code for this PoC can be found here.

Let’s dig into the details of this implementation.

To form an Akka Cluster, there needs to a pre-defined ordered set of contact points often called seed nodes. Each Akka node will try to register itself with the first node from the list of seed nodes. Once, all the seed nodes have joined the cluster, any new node can join the cluster programmatically.

The ordered part is important here, because if the first seed node changes frequently then the chances of split-brain increases. More info about Akka Clustering can be found here.

So, the challenge here with Kubernetes was the ordered set of predefined nodes, and here comes StatefulSet(s) and Headless Services to the rescue.

StatefulSet guarantees stable and ordered pod creation, which satisfies the requirement of our seed nodes, and Headless Service is responsible for their deterministic discovery in the network. So, the first node will be “<application>-0” and the second will be “<application>-1” and so on.

  • <application> is replaced by the actual name of the application

The DNS for the seed nodes will be of the form:

<application-name>-<ordinal>.<service-name>.<namespace>.svc.cluster.local

Steps:

  1. Start with creating the Kubernetes resources. First, the Headless Service, which is responsible for deterministic discovery of seed nodes(Pods), can be created using the following manifest:
kind: Service
apiVersion: v1
metadata:
name: distributed-cache
labels:
  app: distributed-cache
spec:
clusterIP: None
selector:
  app: distributed-cache
ports:
  - port: 2551
    targetPort: 2551
    protocol: TCP

Note, that the “clusterIP” is set to “None.” Which indicates it’s a Headless Service.

2. Create a StatefulSet, which is a manifest for ordered pod creation:

apiVersion: "apps/v1beta2"
kind: StatefulSet
metadata:
name: distributed-cache
spec:
selector:
  matchLabels:
    app: distributed-cache
serviceName: distributed-cache
replicas: 3
template:
  metadata:
    labels:
      app: distributed-cache
  spec:
    containers:
     - name: distributed-cache
       image: "localhost:5000/distributed-cache-on-k8s-poc:1.0"
       env:
         - name: AKKA_ACTOR_SYSTEM_NAME
           value: "distributed-cache-system"
         - name: AKKA_REMOTING_BIND_PORT
           value: "2551"
         - name: POD_NAME
           valueFrom:
             fieldRef:
               fieldPath: metadata.name
         - name: AKKA_REMOTING_BIND_DOMAIN
           value: "distributed-cache.default.svc.cluster.local"
         - name: AKKA_SEED_NODES
           value: "distributed-cache-0.distributed-cache.default.svc.cluster.local:2551,distributed-cache-1.distributed-cache.default.svc.cluster.local:2551,distributed-cache-2.distributed-cache.default.svc.cluster.local:2551"
       ports:
        - containerPort: 2551
       readinessProbe:
        httpGet:
          port: 9000
          path: /health

3. Create a service, which will be responsible for redirecting outside internet traffic to pods:

apiVersion: v1
kind: Service
metadata:
labels:
  app: distributed-cache
name: distributed-cache-service
spec:
selector:
  app: distributed-cache
type: ClusterIP
ports:
  - port: 80
    protocol: TCP
    # this needs to match your container port
    targetPort: 9000

4. Create anIngress, which is responsible for defining a set of rules to route traffic from outside internet to services.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: distributed-cache-ingress
spec:
rules:
  # DNS name your application should be exposed on
  - host: "distributed-cache.com"
    http:
      paths:
        - backend:
            serviceName: distributed-cache-service
            servicePort: 80

And the distributed cache is ready to use:

Summary
This article covers Akka Cluster-sharding on Kubernetes with the pre-requirements of an ordered set of Seed Nodes and their deterministic discovery in the network, and how it can be solved with StatefulSet(s) and Headless Service(s).

This approach of caching data in a distributed fashion offered the following advantages:

  • Less database lookup, saving database IOPS
  • Efficient usage of resources; fewer instances as a result of no duplication of data
  • Lower latencies to serve data

This PoC opens up new doors to think about how we cache data in-memory. Give it a try (all steps to run it locally are mentioned in the Readme).

Interested in working at Zalando Tech? Our job openings are here.

Similar blog posts