A Crossplane Introduction

I recently needed to test some Crossplane compositions for work. To do that, I first started off by building out a testing environment consisting of an ArgoCD instance and a local Kubernetes cluster using vcluster (managed by Crossplane) but eventually realized that my setup was getting a little too complicated for a simple playground. In search of a simpler setup, I bought a couple of Raspberry Pis and installed Talos Linux on them instead.

I wrote a couple posts about this experience. You can read about the Talos Linux on Raspberry Pi side of things here. The other post is the one you are currently reading.

Working in this new homelab environment gave me a lot of time to experiment with Crossplane undisturbed. Now that things are finally set up the way I want, I wanted to take some time and share what I’ve learned about Crossplane and how it actually approaches infrastructure management in practice.

Lets get started.

A declarative infrastructure landscape

If you have been doing this devops/platform-engineering thing for a while, you know that the market for declaring infrastructure as code is pretty heavily saturated. Tools like Terraform, Pulumi, Ansible, and CloudFormation have been around for a long time and have massive communities supporting them.

With so many options out there, why do we need another one? Let’s start by taking a look at how things have traditionally been done.

Say you want to supply developers with a way to stand up a database for their new application. One way you could do this is by creating a Terraform module for an AWS RDS instance. Something like this:

module "db" {
  source = "terraform-aws-modules/rds/aws"

  identifier = "demodb"

  engine               = "mysql"
  engine_version       = "8.0"
  family               = "mysql8.0"
  major_engine_version = "8.0"
  instance_class       = "db.t4g.large"

  allocated_storage     = 20
  max_allocated_storage = 100

  db_name  = "demodb"
  username = "user"
  port     = 3306

  manage_master_user_password = false

  multi_az               = true
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [module.security_group.security_group_id]
}

There are quite a few knobs to turn here. If a developer is standing up a brand new database, do they know which VPC the database should be placed in? Do they know which AWS region they should be using? What about the security groups?

As platform engineers, we usually wrap a module like this in another custom terraform module so that we can answer some of these questions dynamically on the developer’s behalf. We give the developer a much smaller subset of knobs they need to turn. We basically create an API contract between the platform team and the developer where we agree to provide the end result as long as they provide the required inputs. It might look something like this:

module "db" {
  source = "git@github.com:my-company/terraform-modules.git//database"

  engine         = "mysql"
  engine_version = "8.0"
  size           = "large"
}

Our custom module will then take those inputs and map them back to the underlying AWS specific module. The developer doesn’t need to know how the networking is handled because our root module can look up that information dynamically and pass it along into the underlying module.

This is a great pattern and it generally works well but it starts showing weaknesses when an organization needs to pivot.

What happens if your developers are writing code on their laptops locally but using AWS in staging and production? If they are using AWS in the cloud, spinning up a local copy isn’t really feasible. They have to spin up a docker container instead. Now your devs have two different workflows they use when trying to deploy a database.

What happens when your company starts an initiative to move from AWS to Azure? Well now your module needs to know how to provision databases in Azure too and a lot of that underlying networking information is going to be completely different.

The solution almost always ends up being asking developers to rewrite all of their modules or having the platform engineering team rewrite all the internal modules to include massive amounts of branching logic.

Crossplane aims to solve this differently. They separate the platform API contract and the underlying infrastructure into two different concepts.

The Crossplane approach

Crossplane is a Kubernetes add-on. Its goal is to allow teams to build a platform that utilizes the Kubernetes API to manage anything outside of Kubernetes itself.

If we want to continue our example of provisioning a database, Crossplane provides us with two tools: CompositeResourceDefinitions (XRD) and Compositions.

CompositeResourceDefinitions

A CompositeResourceDefinition (or XRD for short) is how you build the platform API contract we talked about in the last chapter. They are essentially identical to Kubernetes CustomResourceDefinitions (CRD) but the difference is Crossplane will watch XRDs and use them to automatically generate multiple CRDs for you.

Let’s look at an example. This creates an XRD called xdatabases.my-company.com:

apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
  name: xdatabases.my-company.com
spec:
  group: my-company.com
  names:
    kind: XDatabase
    plural: xdatabases
  claimNames:
    kind: Database
    plural: databases
  versions:
  - name: v1alpha1
    served: true
    referenceable: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              engine:
                type: string
              engine_version: string
              size: string
            required:
              - engine
              - engine_version
              - size

If you have worked with CRDs before, this should look very familiar to you. All it is really doing is defining the shape of our Database API.

When you apply an XRD to your cluster, Crossplane will watch for it and automatically create the corresponding CRDs.

You’ll get an XDatabase which is going to be cluster-scoped. This works exactly like PersistentVolumes in Kubernetes.

If you provide claimNames in your XRD (like we did here), Crossplane will also automatically create a namespace-scoped CRD called Database. Crossplane calls Claims. They work exactly like PersistentVolumeClaims. A developer would write a Database manifest and Crossplane handles allocating the resources from a pool to fulfill that claim at the cluster level.

Compositions

Compositions tell Crossplane what actually needs to happen when someone requests a Database.

A composition works by defining a list of step that need to be followed. This is called a Composition Pipeline. In each step in the pipeline, Crossplane executes a Function (which can also be customized if you choose).

Let’s look at an example. First let’s install a Function into our cluster that we want to use, particularly function-patch-and-transform.

apiVersion: pkg.crossplane.io/v1beta1
kind: Function
  name: function-patch-and-transform
spec:
  package: xpkg.upbound.io/crossplane/function-patch-and-transform:v0.3.0

That takes care of our dependencies. Let’s now look at the composition. This tells Crossplane that if someone requests an API that matches our XDatabase XRD, we want to create an AWS RDS cluster instead.

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
  name: xdatabases-aws-rds.my-company.com
spec:
  compositeTypeRef:
    apiVersion: my-company.com/v1alpha1
    kind: XDatabase
  mode: Pipeline
  pipeline:
    - step: patch-and-transform
      functionRef:
        name: function-patch-and-transform
      input:
        apiVersion: pt.fn.crossplane.io/v1beta1
        kind: Resources
        resources:
          - name: rds-cluster
            base:
              apiVersion: rds.aws.m.upbound.io/v1beta1
              kind: Cluster
              spec:
                forProvider:
                  region: us-west-1
            patches:
              - type: FromCompositeFieldPath
                fromFieldPath: spec.engine
                toFieldPath: spec.forProvider.engine
              - type: FromCompositeFieldPath
                fromFieldPath: spec.engine_version
                toFieldPath: spec.forProvider.engineVersion

Every time a user creates a Claim in their project’s namespace, Crossplane will invoke this pipeline at the cluster level and provide AWS with the desired state we provide. Using XRDs and Compositions in tandem provides us with a single control plane API that we can extend out over arbitrary cloud providers and architectures natively.

We decouple the developer’s request for architecture from the actual architecture implementation. The request will always be the same regardless of what infrastructure needs deploying under the hood.

A multi-cloud approach designed for local development

Alright, so we now have a new API called Database that our users can use. What does the developer’s manifest actually look like? It might look something like this:

apiVersion: my-company.com/v1alpha1
kind: Database
  name: example-service-db
spec:
  engine: "postgres"
  engine_version: "16.4"
  size: "large"

A few immediate benefits jump out. Since we are using the kubernetes API, we can group this manifest with the rest of the development team’s kubernetes manifests. For example, their database definition could sit in the same codebase directly along side their Deployments or Services manifests.

Because there is no distinction between AWS resources and Kubernetes resources either, we don’t need different tooling to manage them. Things like ArgoCD can naturally pick up these metrics and sync them exactly as you would any other Kubernetes resource. The team doesn’t need CI jobs that run terraform apply anymore because ArgoCD does it for you by treating AWS resources identically to Kubernetes resources natively.

Since Crossplane functions are completely customizable, if a team has an edge case, you can support those individually without impacting other teams or having to bake in overly complicated logic that spans across your fleet. It enables a significantly better developer experience without the massive management overhead we would have otherwise.

There’s another benefit though that is the real game-changer here: Compositions can be applied conditionally depending on the cluster utilizing them.

By default, if Crossplane encounters a Database claim, it looks at the cluster to find a Composition that could fulfill the claim. By explicitly labeling compositions with their intended target workloads, we could instruct our production and staging clusters to fulfill the claim using an AWS RDS composition.

We could ALSO however, provide developer compute environments like vcluster with a composition that installs PostgreSQL via CloudNativePG (CNPG) onto the kubernetes cluster directly when the exact same Database claim is seen and do so securely without touching AWS at all.

If handled correctly, the development team has one single manifest that defines what database structure their service needs and they never need to know nor care what handles the provisioning on the other side. They have locally accessible databases when developing locally and highly compliant databases when pushing code to the cloud all with the exact same workflow, completely transparent to how the platform handles it. The value proposition here shouldn’t be underestimated for organizations with varying and strict compliance requirements in the cloud but don’t want to step on developer velocity.