Autoscaling Mastodon's Sidekiq queues with the KEDA operator

This article assumes Mastodon is installed as a containerized setup on top of e.g. Kubernetes (k0s/k3s/k8s), Rancher, Openshift, etc.
Creating a kubernetes cluster or deploying Mastodon on kubernetes via helm is out of scope of this article.

Introduction

Probably any Mastodon administrator who has seen some (or rapid) growth of their instance has experienced the same issue: growing latency of posts, updates from other servers showing up late or not at all, image uploads not getting processed etc.

The root of this issue lies in the architecture of Mastodon. Every time a new action (create post, upload image, interact with other instance,...) happens on a Mastodon instance, it is not processed immediately, but put into a job queue. This queue consists of two pieces of software: the Sidekiq job queue and the Redis database.

Depending on how busy an instance is, any default Mastodon setup will most likely run into above mentioned issues as its user base grows, since the default setup of the job queue can only cope with a moderate amount of requests before it will be overwhelmed and the job "pipes" will literally get clogged up.

User reporting growing latencies (German)
Image courtesy of b2c@dest-unreachable.net (c)2023

Architecture overview

Flow of actions ("jobs") through a Mastodon instance,

On the right-hand side of the overview we can see the Sidekiq and Redis components.

Mastodon kubernetes installation

The recommended way of deploying Mastodon on kubernetes is via the official Mastodon helm chart

https://github.com/mastodon/chart

Depending on the chosen configuration of the chart, this will produce several kubernetes objects for all the necessary services needed to run a Mastodon instance.

The objects we are interested in for this article are the Sidekiq containers ("pods") which handle different kinds of tasks. Again, depending on the helm configuration, those queues can be spread out over multiple pods, e.g.:

default
ingress
push / pull
mailer
scheduler

See Mastodon's documentation on queues to understand what these do in detail. For now let's focus on the default, ingress and push/pull queues, as those are the most important ones.

Although a single pod can be configured to handle different queues in parallel, it is highly recommeded to separate these pods by queue type to enable the desired scale-out capabilities.

Openshift "Developer" view of a Mastodon installation, deployed via helm.

Challenge

Now, depending on the usage pattern of the instance, the time of day and possibly many other factors, different queues will see varying amount of load. Certain events (posts going viral, major news events, etc.) may even cause temporary peak loads that will subside quickly.

Although monitoring solutions can (and should!) be put in place to detect such scenarios, admin interaction will still be required to scale up the deployments of the pods handling the affected queues. This usually leads to over-provisioning - scaling up the different deployments permanently to prep for such situations just in case. This of course can be costly on resources, and is technically not sound. We should be able to do better.

But what if we had something to check the current load of the queue in Redis and scale pods accordingly? Of course we could have the kubernetes built-in horizontalPodAutoscaler ("HPA") check the CPU usage of the pods and scale them accordingly, but this is not very elegant, and can be misleading.

Solution

In steps the KEDA operator , an event-driven autoscaler which can collect metrics from resources outside the cluster, and scale pods based on this information.

Specifically, we are interested in the amount of jobs in the Redis queues. Luckily, KEDA has a scaler just for this: the Redis lists scaler, which we will utilize to scale our deployments without any admin interaction necessary.

Sweet! But how does one do it?

Autoscaling Sidekiq pods will create additional connections to the PostgreSQL database. Ensure your connection limit is high enough and/or deploy a connection pool (e.g. PgBouncer).

Implementation

To get this working, we need to get two things done:

Install the KEDA operator
Create HPA configs for the deployments we need to scale automatically

Installing the operator

Installing the operator is straightforward and works as advertised in the KEDA documentation.

https://keda.sh/docs/2.10/deploy/

Again, we utilize helm to deploy the operator:

This has been confirmed working on:
- kubernetes 1.25
- Openshift / OKD 4.12
Your mileage may vary on different platforms/versions.

Operator installation:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda

Creating the HPA objects

Initially, we will need a way to authenticate to Redis. This can be achieved by creating a triggerAuthentication object which refers to the secret holding the Redis database password.

KEDA uses so called scaledObject definitions, which in turn will create horizontalPodAutoscalers to scale our workloads.

In the example configuration, we start to scale up our deployment when the amount of jobs in the default queue reaches 1500.

If this is not enough to reduce the jobs in the queue below the threshold, every other minute another pod will be deployed, up to a maximum of four.

If, for any reason, the HPA should fail, the deployment will be scaled to two pods to avoid an outage of the service until the issue can be inspected and rectified.

Example triggerAuthentication :

---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-redis-secret
  namespace: mastodon
spec:
  secretTargetRef:
  - key: redis-password
    name: mastodon-database-secrets
    parameter: password

Example scaledObject :

---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  labels:
    scaledobject.keda.sh/name: mastodon-sidekiq-worker-default
  name: mastodon-sidekiq-worker-default
  namespace: mastodon
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          policies:
          - periodSeconds: 60
            type: Pods
            value: 1
          stabilizationWindowSeconds: 300
  cooldownPeriod: 300
  maxReplicaCount: 4
  minReplicaCount: 1
  pollingInterval: 30
  fallback:
    failureThreshold: 3
    replicas: 2
  scaleTargetRef:
    kind: Deployment
    name: mastodon-sidekiq-worker-default
  triggers:
  - authenticationRef:
      name: trigger-auth-redis-secret
    metadata:
      address: 10.10.10.50:6379
      listLength: "1500"
      listName: queue:default
    metricType: AverageValue
    type: redis

These are just some examples!
Adjust name, namespace, IP address and port of Redis, etc. to your specific setup.

Testing the setup

After the objects have been created in the namespace of the Mastodon instance, we can check if the setup works as expected.

If your instance is not generating enough jobs to reach the threshold, you can just scale down a deployment to zero pods - this should trigger the scaler eventually.

Checking the state of the autoscaler:

The TARGETS column should display the threshold, as well as the current amount of jobs in the queue.

This can be cross-referenced with the values shown in the Mastodon admin interface in: "Sidekiq" -> "Enqueued" to ensure that the correct queue is checked and the integration is working as expected.

~# kubectl get hpa
NAME                                         REFERENCE                                      TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-mastodon-sidekiq-worker-ingress     Deployment/mastodon-sidekiq-worker-ingress     1567/1500        1         4         1          11d

Checking cluster autoscaler events (abbreviated):

As soon as the scaler activates the following messages will be visible in the kubernetes event view, and new sidekiq pods will created:

~# kubectl get events -w
94m         Normal   SuccessfulRescale        horizontalpodautoscaler/keda-hpa-mastodon-sidekiq-worker-ingress   New size: 2; reason: external metric s0-redis-queue-ingress(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: mastodon-sidekiq-worker-ingress,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
94m         Normal   ScalingReplicaSet        deployment/mastodon-sidekiq-worker-ingress                         Scaled up replica set mastodon-sidekiq-worker-ingress-674ccf4b56 to 2 from 1
94m         Normal   SuccessfulCreate         replicaset/mastodon-sidekiq-worker-ingress-674ccf4b56              Created pod: mastodon-sidekiq-worker-ingress-674czvb8q
86m         Normal   SuccessfulRescale        horizontalpodautoscaler/keda-hpa-mastodon-sidekiq-worker-ingress   New size: 1; reason: All metrics below target
86m         Normal   ScalingReplicaSet        deployment/mastodon-sidekiq-worker-ingress                         Scaled down replica set mastodon-sidekiq-worker-ingress-674ccf4b56 to 1 from 2
86m         Normal   SuccessfulDelete         replicaset/mastodon-sidekiq-worker-ingress-674ccf4b56              Deleted pod: mastodon-sidekiq-worker-ingress-674czvb8q

Additional caveats Depending on setup specifics, certain parts of the configuration should be adjusted.
Considerations regarding Redis namespaces:	If REDIS_NAMESPACE is defined in the environment, listName must be adjusted to reference the namespace: `listName: <REDIS_NAMESPACE>:queue:default` E.g. if REDIS_NAMESPACE=masto5 , then set listName to: `listName: masto5:queue:default`
Considerations regarding pods with multiple Sidekiq roles: It is a common practice to have the push and pull queues handled by one pod. In this case, triggers for both queues must be defined in the `scaledObject` - `triggers` is a YAML list and can hold more than one entry.	Just add additional definitions in the `triggers` part of the definition: `triggers: - authenticationRef: name: trigger-auth-redis-secret metadata: address: 10.10.10.50:6379 listLength: "1500" listName: queue:push metricType: AverageValue type: redis - authenticationRef: name: trigger-auth-redis-secret metadata: address: 10.10.10.50:6379 listLength: "1500" listName: queue:pull metricType: AverageValue type: redis`
Considerations regarding performance and pod count:	The referenced values for pod count, amount of jobs in the queue, when to scale up or down, or the time between scale operations are highly dependent on the underlying hardware and overall utilization of the cluster. They have proven sufficient in the author's setup and can be used as a starting point, but most likely will need to be adjusted.

Announcement: Fediverse Foundation data center migration 2024-03-02

Announcement: Fediverse Foundation infrastructure maintenance 2024-02-26

Incident: Partial disruption 2023-09-11 00:10 - 07:20 due to DDoS attack

Announcement: Fediverse Foundation infrastructure update 2023-08-12

Announcement: Change of the CDN of climatejustice.social 2023-07-13

Autoscaling Mastodon's Sidekiq queues with the KEDA operator

Wartungfenster climatejustice.(social+rocks) Sa., 2023-04-22 (Maintenance climatejustice.(social+rocks))

Wartungfenster climatejustice.global Sa., 2023-04-08 (Maintenance climatejustice.global)

Wartungfenster wien.rocks Fr., 2023-03-31 ab 19h (Maintenance wien.rocks)

Incident: Service disruption 2023-04-06 22:54 - 2023-04-07 08:22

Datenschutzerklärung

Data protection policy

Technische und organisatorische Maßnahmen

Technical and organizational measures

Autoscaling Mastodon's Sidekiq queues with the KEDA operator

Introduction

Architecture overview

Mastodon kubernetes installation

Challenge

Solution

Implementation

Installing the operator

Creating the HPA objects

Testing the setup

Additional caveats