Multitenancy in kubernetes, a billion dollar problem

When we talk about a multitenant system, there’s a spectrum of possiblities such as hard multitenancy, soft multitenancy, and other solutions in between. The goal of multitenancy is to enable a system to support different levels of isolation between users (also referred to as tenants) so that they can install and manage resources without affecting one another. Anecdotally, companies with large-scale kubernetes clusters, such as Spotify and Apple, use some form of multitenancy so that different teams can collaborate within a single cluster. For example, data science teams can safely work alongside engineering and devops teams within a multitenant system.

Hard multitenancy refers to complete isolation between tenants. A good analogy would be a VM to the base operating system. When you fire up an generic EC2 instance on AWS you get access to a VM running on a hypervisor that is completely isolated from other VMs on the host. The VM runs with its own kernel, and resource limitations on the VM prevent noisy-neighbor problems where a user can interfere with other user workloads. There’s no way to access the contents of other VMs or access global settings on the node.

Soft multitenancy refers to a degree of isolation that is not as strict as hard multitenancy but still effective. An example would be namespaces in kubernetes. A namespace is a logical grouping of resources scoped to a singular unit and is the default multitenancy mechanism employed in kubernetes. Using RBAC policies one can limit the operation of a workload, for example an operator, to a singular namespace so it cannot affect workloads in other namespaces. Some key resources such as PodDisruptionBudgets are namespace scoped and only apply to pods in a given namespace. Garbage collection for parent/child resources works on a namespace basis as well - cross-namespace owner references are disallowed by design and have to be GC’d manually. A good analogy would be a container to the host - there is some degree of isolation on the process-level but since they share a kernel there is less isolation than in the case of a VM.

Note this problem does not directly relate to the solution of separate environments for dev/staging/prod which is considered standard best practice for large production deployments.

Why is multitenancy hard in kubernetes?

Kubernetes has no first-class concept of a tenant, and although workloads can be provided some degree of isolation via namespaces and RBAC rules, some resources are cluster-wide in scope and this can result in workloads affecting one another invadvertently. For example, CustomResourceDefinitions (CRDs) are key resources that extend the control plane of the cluster by introducing new APIs. They are versioned, and users across the cluster can see them in their namespace (if they have access to list CRDs) whether they installed them or not. If one user wants to use a newer version of the CRD, and it’s not backwards compatible with existing CRs on the cluster, they risk breaking other workloads when they upgrade the CRD.

Other important resources are also cluster-scoped, for example webhooks. An admission webhook determines whether certain resources should be allowed to be created in the cluster. For example, you may create an admission webhook that enforces pod security policies that disallow pods to run as root. Since this webhook watches all Pod Kinds, it can prevent other users from creating pods with the need to run as root (for example networking pods or other low-level node configuration pods).

The last example is that of operator multitenancy, which can be complex to implement. Operators run on the cluster and continuously reconcile different resources either within a single namespace or across the cluster. Users may want to install multiple versions of the same operator onto the cluster, for example multiple different instances of an AWS operator that enables users to create cloud resources outside of the cluster based on their IAM accounts. Different users have different permission levels and so require their own instance of the AWS operator to run. Safely installing and upgrading this type of operator can be extremely challening due to the lack of a harder tenancy model within core kuberentes.

Why is multitenancy important?

Money. Similar to the autoscaling problem (over-scale and waste money, under-scale and risk high latencies which results in lost revenue) multitenancy tackles the problem of distributing compute effectively. Giving users the right amount of compute for their needs is harder when each team needs to setup and manage its own cluster. Effectively managing clusters is hard work and increasing the complexity increases operational efforts at the expense of delivering more business value. If teams can share the same set of resources within a cluster or across clusters, it could reduce compute costs.
Security. Teams should be able to work within their environment without the fear of affecting other teams when doing things like updates. Vulnerabilities in their software that result in issues like privledge escalation should not compromise the entire cluster. A layered approach to security is best, and multitenancy can be part of the solution.

Some ways to solve the challenge

Hierarchical namespaces, which work similar to parent/child resources in kubernetes. Parent namespaces can have child namespaces, and users in a child namespace do not have access to the parent namespace directly. This enables more isolation, as users can have full access to their child namespace in a transparent way without affecting other child namespaces or the parent. For more on this project, check out the hnc repository.
Virtualize the api-server so that each individual tenant has their own instance of the api-server which is then synced with the master api-server (the tenant api-server can either be a physical pod or virtualized). The virtual-cluster project is developing this solution and part of this work is now in the Cluster API Provider Nested SIG. See the virtualcluster repository for more info on virtualcluster.
Cluster scope everything, including operators, and go with a clusters-per-team model. This is the solution proposed by members of the core kubernetes API team, since changing core kubernetes to incorporate multitenancy is extremely difficult and far away. This approach throws intra-cluster multitenancy out the window, but can leave the possibility of inter-cluster coordination. The Operator Lifecycle Manager team is considering this approach after the initial soft multitenancy solution of OperatorGroups proved hard for users to understand and had some edge-cases.
There are a variety of third party solutions that attempt to tackle multitenancy in kubernetes, among them Arktos (a fork focused on multitenancy) and Kiosk (an extension layer build on CRDs).

There is no simple solution, but it’s worth investigating all possible options and making an informed decision when considering multitenant solutions for kubernetes users in your organization.

Theme Moonwalk