A Crossplane Introduction
I recently needed to test some Crossplane compositions for work. To do that, I first started off by building out a testing environment consisting of an ArgoCD instance and a local Kubernetes cluster using vcluster (managed by Crossplane) but eventually realized that my setup was getting a little too complicated for a simple playground. In search of a simpler setup, I bought a couple of Raspberry Pis and installed Talos Linux on them instead.
I wrote a couple posts about this experience. You can read about the Talos Linux on Raspberry Pi side of things here. The other post is the one you are currently reading.
Working in this new homelab environment gave me a lot of time to experiment with Crossplane undisturbed. Now that things are finally set up the way I want, I wanted to take some time and share what I’ve learned about Crossplane and how it actually approaches infrastructure management in practice.
Lets get started.
A declarative infrastructure landscape
If you have been doing this devops/platform-engineering thing for a while, you know that the market for declaring infrastructure as code is pretty heavily saturated. Tools like Terraform, Pulumi, Ansible, and CloudFormation have been around for a long time and have massive communities supporting them.
With so many options out there, why do we need another one? Let’s start by taking a look at how things have traditionally been done.
Say you want to supply developers with a way to stand up a database for their new application. One way you could do this is by creating a Terraform module for an AWS RDS instance. Something like this:
module "db" {
source = "terraform-aws-modules/rds/aws"
identifier = "demodb"
engine = "mysql"
engine_version = "8.0"
family = "mysql8.0"
major_engine_version = "8.0"
instance_class = "db.t4g.large"
allocated_storage = 20
max_allocated_storage = 100
db_name = "demodb"
username = "user"
port = 3306
manage_master_user_password = false
multi_az = true
db_subnet_group_name = module.vpc.database_subnet_group
vpc_security_group_ids = [module.security_group.security_group_id]
}
There are quite a few knobs to turn here. If a developer is standing up a brand new database, do they know which VPC the database should be placed in? Do they know which AWS region they should be using? What about the security groups?
As platform engineers, we usually wrap a module like this in another custom terraform module so that we can answer some of these questions dynamically on the developer’s behalf. We give the developer a much smaller subset of knobs they need to turn. We basically create an API contract between the platform team and the developer where we agree to provide the end result as long as they provide the required inputs. It might look something like this:
module "db" {
source = "git@github.com:my-company/terraform-modules.git//database"
engine = "mysql"
engine_version = "8.0"
size = "large"
}
Our custom module will then take those inputs and map them back to the underlying AWS specific module. The developer doesn’t need to know how the networking is handled because our root module can look up that information dynamically and pass it along into the underlying module.
This is a great pattern and it generally works well but it starts showing weaknesses when an organization needs to pivot.
What happens if your developers are writing code on their laptops locally but using AWS in staging and production? If they are using AWS in the cloud, spinning up a local copy isn’t really feasible. They have to spin up a docker container instead. Now your devs have two different workflows they use when trying to deploy a database.
What happens when your company starts an initiative to move from AWS to Azure? Well now your module needs to know how to provision databases in Azure too and a lot of that underlying networking information is going to be completely different.
The solution almost always ends up being asking developers to rewrite all of their modules or having the platform engineering team rewrite all the internal modules to include massive amounts of branching logic.
Crossplane aims to solve this differently. They separate the platform API contract and the underlying infrastructure into two different concepts.
The Crossplane approach
Crossplane is a Kubernetes add-on. Its goal is to allow teams to build a platform that utilizes the Kubernetes API to manage anything outside of Kubernetes itself.
If we want to continue our example of provisioning a database, Crossplane provides us with two tools: CompositeResourceDefinitions (XRD) and Compositions.
CompositeResourceDefinitions
A CompositeResourceDefinition (or XRD for short) is how you build the platform API contract we talked about in the last chapter. They are essentially identical to Kubernetes CustomResourceDefinitions (CRD) but the difference is Crossplane will watch XRDs and use them to automatically generate multiple CRDs for you.
Let’s look at an example. This creates an XRD called
xdatabases.my-company.com:
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
name: xdatabases.my-company.com
spec:
group: my-company.com
names:
kind: XDatabase
plural: xdatabases
claimNames:
kind: Database
plural: databases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
engine_version: string
size: string
required:
- engine
- engine_version
- size
If you have worked with CRDs before, this should look very familiar to you. All it is really
doing is
defining the shape of our Database API.
When you apply an XRD to your cluster, Crossplane will watch for it and automatically create the corresponding CRDs.
You’ll get an XDatabase which is going to be cluster-scoped. This works
exactly like
PersistentVolumes in Kubernetes.
If you provide claimNames in your XRD (like we did here), Crossplane will also
automatically
create a namespace-scoped CRD called Database. Crossplane calls
Claims. They
work exactly like PersistentVolumeClaims. A developer would write a Database
manifest and
Crossplane handles allocating the resources from a pool to fulfill that claim at the cluster
level.
Compositions
Compositions tell Crossplane what actually needs to happen when someone requests a
Database.
A composition works by defining a list of step that need to be followed. This is called a
Composition
Pipeline. In each step in the pipeline, Crossplane executes a Function (which can
also be
customized if you choose).
Let’s look at an example. First let’s install a Function into our
cluster that
we want to use, particularly function-patch-and-transform.
apiVersion: pkg.crossplane.io/v1beta1
kind: Function
name: function-patch-and-transform
spec:
package: xpkg.upbound.io/crossplane/function-patch-and-transform:v0.3.0
That takes care of our dependencies. Let’s now look at the composition. This tells
Crossplane that
if someone requests an API that matches our XDatabase XRD, we want to create an AWS
RDS
cluster instead.
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
name: xdatabases-aws-rds.my-company.com
spec:
compositeTypeRef:
apiVersion: my-company.com/v1alpha1
kind: XDatabase
mode: Pipeline
pipeline:
- step: patch-and-transform
functionRef:
name: function-patch-and-transform
input:
apiVersion: pt.fn.crossplane.io/v1beta1
kind: Resources
resources:
- name: rds-cluster
base:
apiVersion: rds.aws.m.upbound.io/v1beta1
kind: Cluster
spec:
forProvider:
region: us-west-1
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.engine
toFieldPath: spec.forProvider.engine
- type: FromCompositeFieldPath
fromFieldPath: spec.engine_version
toFieldPath: spec.forProvider.engineVersion
Every time a user creates a Claim in their project’s namespace, Crossplane
will invoke
this pipeline at the cluster level and provide AWS with the desired state we provide. Using XRDs
and
Compositions in tandem provides us with a single control plane API that we can extend out over
arbitrary
cloud providers and architectures natively.
We decouple the developer’s request for architecture from the actual architecture implementation. The request will always be the same regardless of what infrastructure needs deploying under the hood.
A multi-cloud approach designed for local development
Alright, so we now have a new API called Database that our users can use. What does
the
developer’s manifest actually look like? It might look something like this:
apiVersion: my-company.com/v1alpha1
kind: Database
name: example-service-db
spec:
engine: "postgres"
engine_version: "16.4"
size: "large"
A few immediate benefits jump out. Since we are using the kubernetes API, we can group this
manifest with
the rest of the development team’s kubernetes manifests. For example, their database
definition
could sit in the same codebase directly along side their Deployments or
Services manifests.
Because there is no distinction between AWS resources and Kubernetes resources either, we
don’t
need different tooling to manage them. Things like ArgoCD can naturally pick up these metrics
and sync
them exactly as you would any other Kubernetes resource. The team doesn’t need CI jobs
that run
terraform apply anymore because ArgoCD does it for you by treating AWS resources
identically to Kubernetes resources natively.
Since Crossplane functions are completely customizable, if a team has an edge case, you can support those individually without impacting other teams or having to bake in overly complicated logic that spans across your fleet. It enables a significantly better developer experience without the massive management overhead we would have otherwise.
There’s another benefit though that is the real game-changer here: Compositions can be applied conditionally depending on the cluster utilizing them.
By default, if Crossplane encounters a Database claim, it looks at the cluster to
find a
Composition that could fulfill the claim. By explicitly labeling compositions with their
intended target
workloads, we could instruct our production and staging clusters to fulfill the claim using an
AWS RDS
composition.
We could ALSO however, provide developer compute environments like vcluster with a composition
that
installs PostgreSQL via CloudNativePG (CNPG) onto the kubernetes cluster directly when the exact
same
Database claim is seen and do so securely without touching AWS at all.
If handled correctly, the development team has one single manifest that defines what database structure their service needs and they never need to know nor care what handles the provisioning on the other side. They have locally accessible databases when developing locally and highly compliant databases when pushing code to the cloud all with the exact same workflow, completely transparent to how the platform handles it. The value proposition here shouldn’t be underestimated for organizations with varying and strict compliance requirements in the cloud but don’t want to step on developer velocity.