Writing a Kubernetes CSI Driver: Controller and Node from Scratch

Kubernetes storage is one of those areas that looks simple from the outside — you create a PersistentVolumeClaim, a pod mounts it, done. But the moment you need to integrate your own storage backend, you’re staring at the Container Storage Interface spec, sidecar containers you’ve never heard of, and gRPC services that have to be wired together just right.

I had the pleasure of writing and contributing to a production-grade CSI driver end-to-end. This post covers everything I wish I had in one place: what CSI actually is, how Kubernetes orchestrates it, and how to implement all three services in Go.

What is CSI?

CSI (Container Storage Interface) is a standardized gRPC API between Kubernetes and storage providers. Before CSI existed, storage drivers were compiled directly into Kubernetes — adding a new one meant patching the Kubernetes tree. If you remember a few Kubernetes versions ago where your non-Azure Kubernetes is actually logging non-stop that Azure is not working? Yeah, that kind of problem due to coupling.

CSI moved drivers out-of-tree: your storage backend ships its own binary, Kubernetes talks to it over a Unix socket.

The spec defines three gRPC services:

IdentityService — who are you, are you healthy?
ControllerService — manage volumes at the infrastructure level (create, delete, attach, snapshot)
NodeService — manage volumes on a specific node (format, mount, bind-mount into pods)

Your driver implements these. Kubernetes provides sidecar containers that translate Kubernetes events into the right RPC calls.

Architecture Overview

Two separate binaries run in the cluster:

graph TB
    subgraph CP["Controller Pod — Deployment, 1 replica"]
        PROV[csi-provisioner]
        ATT[csi-attacher]
        SNAP[csi-snapshotter]
        RES[csi-resizer]
        CLIVE[liveness-probe]
        CDRV["Your Driver\nControllerService + IdentityService"]

        PROV -->|gRPC unix socket| CDRV
        ATT -->|gRPC unix socket| CDRV
        SNAP -->|gRPC unix socket| CDRV
        RES -->|gRPC unix socket| CDRV
        CLIVE -->|gRPC unix socket| CDRV
    end

    subgraph NP["Node Pod — DaemonSet, 1 per node"]
        REG[node-driver-registrar]
        NLIVE[liveness-probe]
        NDRV["Your Driver\nNodeService + IdentityService"]

        REG -->|gRPC unix socket| NDRV
        NLIVE -->|gRPC unix socket| NDRV
    end

    KBL[kubelet] -->|gRPC unix socket| NDRV
    REG -->|registers socket path| KBL

The controller pod runs centrally and manages volume lifecycle against your storage backend API. The node pod runs on every machine and handles the actual OS-level work: formatting disks, mounting filesystems, bind-mounting into pods.

They communicate via Unix domain sockets, not TCP. Each sidecar and the driver share an emptyDir (controller) or hostPath (node) volume where the socket lives.

How Kubernetes Orchestrates CSI

The sidecars are the glue. You don’t call your driver directly — the sidecars watch Kubernetes API resources and translate events into gRPC calls.

Sidecar	Watches	Triggers RPC
`csi-provisioner`	PersistentVolumeClaim	`CreateVolume` / `DeleteVolume`
`csi-attacher`	VolumeAttachment	`ControllerPublishVolume` / `ControllerUnpublishVolume`
`csi-snapshotter`	VolumeSnapshot	`CreateSnapshot` / `DeleteSnapshot`
`csi-resizer`	PVC resize	`ControllerExpandVolume`
`node-driver-registrar`	startup	registers socket with kubelet
`kubelet`	Pod scheduling	`NodeStageVolume` / `NodePublishVolume`

kubelet talks to the node plugin directly after node-driver-registrar registers the socket path at /var/lib/kubelet/plugins_registry/.

The Full Volume Lifecycle

Provision → Mount

sequenceDiagram
    actor User
    participant K8s as Kubernetes API
    participant PROV as csi-provisioner
    participant ADC as AttachDetachController
    participant ATT as csi-attacher
    participant CTRL as ControllerService
    participant BACK as Storage Backend
    participant KBL as kubelet
    participant NODE as NodeService

    User->>K8s: create PVC (16Gi, RWO)
    K8s->>PROV: PVC event
    PROV->>CTRL: CreateVolume(name, 16Gi, topology)
    CTRL->>BACK: create volume
    CTRL->>BACK: poll status → "available"
    CTRL-->>PROV: CreateVolumeResponse{volumeID, capacityBytes, topology}
    PROV->>K8s: create PersistentVolume{volumeHandle=volumeID}
    K8s->>K8s: bind PV ↔ PVC

    User->>K8s: create Pod using PVC
    K8s->>ADC: Pod scheduled to node
    ADC->>K8s: create VolumeAttachment{volumeID, nodeID}
    K8s->>ATT: VolumeAttachment event
    ATT->>CTRL: ControllerPublishVolume(volumeID, nodeID)
    CTRL->>BACK: attach volume to VM
    BACK-->>CTRL: attached, devicePath=/dev/vdb
    CTRL-->>ATT: ControllerPublishVolumeResponse{PublishContext{DevicePath=/dev/vdb}}
    ATT->>K8s: VolumeAttachment.status.attached=true

    KBL->>NODE: NodeStageVolume(volumeID, stagingPath, PublishContext)
    NODE->>NODE: mkfs.ext4 /dev/vdb
    NODE->>NODE: mount /dev/vdb → stagingPath
    NODE-->>KBL: NodeStageVolumeResponse{}

    KBL->>NODE: NodePublishVolume(volumeID, targetPath, stagingPath)
    NODE->>NODE: mount --bind stagingPath → targetPath
    NODE-->>KBL: NodePublishVolumeResponse{}
    K8s-->>User: Pod Running, /data mounted

Three distinct paths to understand here:

Provision (csi-provisioner → CreateVolume) — creates the volume in your backend, Kubernetes creates a PV.
Attach (csi-attacher → ControllerPublishVolume) — attaches the block device to the VM running the pod. Returns the device path (e.g. /dev/vdb) in PublishContext.
Stage + Publish (kubelet → NodeStageVolume + NodePublishVolume) — formats and mounts the device to a staging path, then bind-mounts that into the pod’s specific target path.

The staging/publish split exists so multiple pods on the same node can share one formatted device via bind mounts, rather than formatting once per pod.

Unmount → Delete

sequenceDiagram
    actor User
    participant K8s as Kubernetes API
    participant PROV as csi-provisioner
    participant ADC as AttachDetachController
    participant ATT as csi-attacher
    participant CTRL as ControllerService
    participant BACK as Storage Backend
    participant KBL as kubelet
    participant NODE as NodeService

    User->>K8s: delete Pod
    KBL->>NODE: NodeUnpublishVolume(volumeID, targetPath)
    NODE->>NODE: umount targetPath
    NODE-->>KBL: NodeUnpublishVolumeResponse{}

    KBL->>NODE: NodeUnstageVolume(volumeID, stagingPath)
    NODE->>NODE: umount stagingPath
    NODE-->>KBL: NodeUnstageVolumeResponse{}

    ADC->>K8s: delete VolumeAttachment
    K8s->>ATT: VolumeAttachment deletion event
    ATT->>CTRL: ControllerUnpublishVolume(volumeID, nodeID)
    CTRL->>BACK: detach volume from VM
    CTRL-->>ATT: ControllerUnpublishVolumeResponse{}

    User->>K8s: delete PVC
    K8s->>PROV: PVC deletion event
    PROV->>CTRL: DeleteVolume(volumeID)
    CTRL->>BACK: delete volume
    CTRL-->>PROV: DeleteVolumeResponse{}
    PROV->>K8s: delete PersistentVolume

Exact reverse order: unpublish → unstage → detach → delete. Each step must succeed before the next begins.

Snapshots

sequenceDiagram
    actor User
    participant K8s as Kubernetes API
    participant SNAP as csi-snapshotter
    participant PROV as csi-provisioner
    participant CTRL as ControllerService
    participant BACK as Storage Backend

    User->>K8s: create VolumeSnapshot{source=PVC}
    K8s->>SNAP: VolumeSnapshot event
    SNAP->>CTRL: CreateSnapshot(name, sourceVolumeID)
    CTRL->>BACK: create snapshot
    CTRL->>BACK: poll status → "available"
    CTRL-->>SNAP: CreateSnapshotResponse{snapshotID, readyToUse=true}
    SNAP->>K8s: create VolumeSnapshotContent{snapshotHandle=snapshotID}
    K8s-->>User: VolumeSnapshot Ready=true

    User->>K8s: create PVC{dataSource=VolumeSnapshot}
    K8s->>PROV: PVC event
    PROV->>CTRL: CreateVolume(name, size, contentSource={snapshotID})
    CTRL->>BACK: create volume from snapshot
    CTRL-->>PROV: CreateVolumeResponse{volumeID, resizeRequired=true}
    Note over PROV,K8s: NodeStageVolume will resize filesystem

Volume Expansion

sequenceDiagram
    actor User
    participant K8s as Kubernetes API
    participant RES as csi-resizer
    participant CTRL as ControllerService
    participant BACK as Storage Backend
    participant KBL as kubelet
    participant NODE as NodeService

    User->>K8s: patch PVC 16Gi → 32Gi
    K8s->>RES: PVC resize event
    RES->>CTRL: ControllerExpandVolume(volumeID, 32Gi)
    CTRL->>BACK: expand volume
    CTRL->>BACK: poll → new size confirmed
    CTRL-->>RES: ControllerExpandVolumeResponse{nodeExpansionRequired=true}
    RES->>K8s: update PV capacity = 32Gi

    KBL->>NODE: NodeExpandVolume(volumeID, volumePath, 32Gi)
    NODE->>NODE: rescan block device geometry
    NODE->>NODE: resize2fs /dev/vdb
    NODE-->>KBL: NodeExpandVolumeResponse{}
    K8s-->>User: PVC capacity = 32Gi

Implementing the Driver in Go

Project Structure

cmd/
  controller/main.go    # wires ControllerService + IdentityService
  node/main.go          # wires NodeService + IdentityService
internal/
  server/server.go      # Unix socket listener + gRPC server
  driver/
    identity.go
    controller.go       # struct + capability declarations
    controller_volume.go
    controller_snapshot.go
    node.go
    mounter.go          # interface wrapping k8s.io/mount-utils
    metadata.go         # node instance ID + AZ

Two binaries, one shared internal/driver package. The controller binary registers ControllerServer + IdentityServer. The node binary registers NodeServer + IdentityServer.

The gRPC Server

func CreateListener(endpoint string) (net.Listener, error) {
    path := strings.TrimPrefix(endpoint, "unix://")
    _ = os.Remove(path)

    return net.Listen("unix", path)
}

func CreateGRPCServer(logger *slog.Logger) *grpc.Server {
    return grpc.NewServer(
        grpc.ChainUnaryInterceptor(requestLogger(logger)),
    )
}

// cmd/controller/main.go — error handling omitted for brevity.
listener, _ := internal.CreateListener(cfg.CSIEndpoint)
grpcServer := internal.CreateGRPCServer(logger)
csiproto.RegisterControllerServer(grpcServer, controllerService)
csiproto.RegisterIdentityServer(grpcServer, identityService)
grpcServer.Serve(listener)

IdentityService

The simplest service. Declares what the driver supports and responds to health checks.

type IdentityService struct {
    csiproto.UnimplementedIdentityServer
    logger *slog.Logger
    ready  atomic.Bool
}

func (is *IdentityService) GetPluginInfo(_ context.Context, _ *csiproto.GetPluginInfoRequest) (*csiproto.GetPluginInfoResponse, error) {
    return &csiproto.GetPluginInfoResponse{
        Name:          "csi.example.com",
        VendorVersion: "1.0.0",
    }, nil
}

func (is *IdentityService) GetPluginCapabilities(_ context.Context, _ *csiproto.GetPluginCapabilitiesRequest) (*csiproto.GetPluginCapabilitiesResponse, error) {
    return &csiproto.GetPluginCapabilitiesResponse{
        Capabilities: []*csiproto.PluginCapability{
            {Type: &csiproto.PluginCapability_Service_{
                Service: &csiproto.PluginCapability_Service{
                    Type: csiproto.PluginCapability_Service_CONTROLLER_SERVICE,
                },
            }},
        },
    }, nil
}

func (is *IdentityService) Probe(_ context.Context, _ *csiproto.ProbeRequest) (*csiproto.ProbeResponse, error) {
    return &csiproto.ProbeResponse{
        Ready: &wrapperspb.BoolValue{Value: is.ready.Load()},
    }, nil
}

Call is.SetReady(true) after everything is wired up, just before grpcServer.Serve.

ControllerService

The controller manages your storage backend. The struct holds clients to your backend APIs:

type ControllerService struct {
    csiproto.UnimplementedControllerServer
    logger    *slog.Logger
    volClient VolumeAPIClient
    vmClient  VMAPIClient
    region    string
}

CreateVolume must be idempotent — csi-provisioner will retry. Check by name first:

func (cs *ControllerService) CreateVolume(ctx context.Context, req *csiproto.CreateVolumeRequest) (*csiproto.CreateVolumeResponse, error) {
    if req.GetName() == "" {
        return nil, status.Error(codes.InvalidArgument, "missing name")
    }

    existing, err := cs.volClient.GetByName(ctx, req.GetName())
    if err == nil {
        return buildCreateResponse(existing), nil
    }


    sizeGB := bytesToGB(req.GetCapacityRange().GetRequiredBytes())

    vol, err := cs.volClient.Create(ctx, CreateVolumeParams{
        Name:             req.GetName(),
        SizeGB:           sizeGB,
        AvailabilityZone: azFromTopology(req.GetAccessibilityRequirements()),
    })
    if err != nil {
        return nil, status.Errorf(codes.Internal, "create volume failed: %v", err)
    }

    err = cs.waitForStatus(ctx, vol.ID, "available")
    if err != nil {
        return nil, status.Errorf(codes.Internal, "volume not ready: %v", err)
    }

    return buildCreateResponse(vol), nil
}

The exponential backoff wait is critical — most storage APIs are asynchronous:

func (cs *ControllerService) waitForStatus(ctx context.Context, volumeID, target string) error {
    backoff := wait.Backoff{
        Duration: time.Second,
        Factor:   1.1,
        Steps:    15,
    }
    return wait.ExponentialBackoffWithContext(ctx, backoff, func(ctx context.Context) (bool, error) {
        vol, err := cs.volClient.Get(ctx, volumeID)
        if err != nil {
            return false, err
        }
        return vol.Status == target, nil
    })
}

ControllerPublishVolume attaches the block device to the VM. The key output is PublishContext — this is how the device path travels from the controller to the node:

func (cs *ControllerService) ControllerPublishVolume(ctx context.Context, req *csiproto.ControllerPublishVolumeRequest) (*csiproto.ControllerPublishVolumeResponse, error) {
    volumeID := req.GetVolumeId()
    nodeID := req.GetNodeId()

    err = cs.vmClient.AttachVolume(ctx, nodeID, volumeID)
    if err != nil {
        return nil, status.Errorf(codes.Internal, "attach failed: %v", err)
    }

    err = cs.waitUntilAttached(ctx, nodeID, volumeID)
    if err != nil {
        return nil, status.Errorf(codes.Internal, "attach timeout: %v", err)
    }

    devicePath, err := cs.volClient.GetDevicePath(ctx, volumeID, nodeID)
    if err != nil {
        return nil, status.Errorf(codes.Internal, "device path: %v", err)
    }

    return &csiproto.ControllerPublishVolumeResponse{
        PublishContext: map[string]string{
            "DevicePath": devicePath,
        },
    }, nil
}

NodeService

The node runs with elevated privileges on every machine. It receives the PublishContext from the controller and turns it into real mounts.

NodeStageVolume formats (if needed) and mounts to a shared staging path:

func (ns *NodeService) NodeStageVolume(ctx context.Context, req *csiproto.NodeStageVolumeRequest) (*csiproto.NodeStageVolumeResponse, error) {
    stagingTarget := req.GetStagingTargetPath()
    devicePath := req.GetPublishContext()["DevicePath"]

    notMnt, err := ns.mounter.IsLikelyNotMountPointAttach(stagingTarget)
    if err != nil {
        return nil, status.Error(codes.Internal, err.Error())
    }

    if notMnt {
        fsType := "ext4"
        if mnt := req.GetVolumeCapability().GetMount(); mnt != nil && mnt.GetFsType() != "" {
            fsType = mnt.GetFsType()
        }

        err = ns.mounter.FormatAndMount(devicePath, stagingTarget, fsType, nil)
        if err != nil {
            return nil, status.Error(codes.Internal, err.Error())
        }
    }

    return &csiproto.NodeStageVolumeResponse{}, nil
}

NodePublishVolume bind-mounts the staging path into the pod’s specific directory:

func (ns *NodeService) NodePublishVolume(ctx context.Context, req *csiproto.NodePublishVolumeRequest) (*csiproto.NodePublishVolumeResponse, error) {
    targetPath := req.GetTargetPath()
    stagingPath := req.GetStagingTargetPath()

    mountOptions := []string{"bind"}
    if req.GetReadonly() {
        mountOptions = append(mountOptions, "ro")
    } else {
        mountOptions = append(mountOptions, "rw")
    }

    err := ns.mounter.Mount(stagingPath, targetPath, "ext4", mountOptions)
    if err != nil {
        return nil, status.Errorf(codes.Internal, "bind mount failed: %v", err)
    }


    return &csiproto.NodePublishVolumeResponse{}, nil
}

NodeGetInfo is called by kubelet at startup to learn the node’s identity and topology. This is how topology.kubernetes.io/zone gets set:

func (ns *NodeService) NodeGetInfo(ctx context.Context, _ *csiproto.NodeGetInfoRequest) (*csiproto.NodeGetInfoResponse, error) {
    instanceID, err := ns.metadata.GetInstanceID()
    if err != nil {
        return nil, status.Errorf(codes.Internal, "instance ID: %v", err)
    }

    zone, err := ns.metadata.GetAvailabilityZone()
    if err != nil {
        return nil, status.Errorf(codes.Internal, "availability zone: %v", err)
    }


    return &csiproto.NodeGetInfoResponse{
        NodeId: instanceID,
        AccessibleTopology: &csiproto.Topology{
            Segments: map[string]string{
                "topology.kubernetes.io/zone": zone,
            },
        },
    }, nil
}

The Mounter Interface

Rather than calling mount directly, wrap k8s.io/mount-utils. This makes the node service testable with a fake mounter:

type Mounter interface {
    FormatAndMount(source, target, fsType string, options []string) error
    Mount(source, target, fsType string, options []string) error
    UnmountPath(mountPath string) error
    IsLikelyNotMountPointAttach(path string) (bool, error)
    GetDevicePath(ctx context.Context, volumeID string) (string, error)
    Execer() exec.Interface
}

func NewMounter() Mounter {
    return &mounter{
        BaseMounter: mountutil.New(""),
        exec:        exec.New(),
    }
}

Deployment

Controller Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: csi-controller
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: csi-provisioner
        image: registry.k8s.io/sig-storage/csi-provisioner:v5.2.0
        args: ["--csi-address=/csi/csi.sock", "--leader-election"]
        volumeMounts:
        - name: socket-dir
          mountPath: /csi

      - name: csi-attacher
        image: registry.k8s.io/sig-storage/csi-attacher:v4.8.0
        args: ["--csi-address=/csi/csi.sock", "--leader-election"]
        volumeMounts:
        - name: socket-dir
          mountPath: /csi

      - name: csi-driver
        image: your-registry/csi-controller:latest
        env:
        - name: CSI_ENDPOINT
          value: unix:///csi/csi.sock
        volumeMounts:
        - name: socket-dir
          mountPath: /csi

      volumes:
      - name: socket-dir
        emptyDir: {}  # shared between all sidecars in this pod

Node DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: csi-node
spec:
  template:
    spec:
      containers:
      - name: node-driver-registrar
        image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.13.0
        args:
        - --csi-address=/csi/csi.sock
        - --kubelet-registration-path=/var/lib/kubelet/plugins/csi.example.com/csi.sock
        volumeMounts:
        - name: socket-dir
          mountPath: /csi
        - name: registration-dir
          mountPath: /registration

      - name: csi-driver
        image: your-registry/csi-node:latest
        securityContext:
          privileged: true
          capabilities:
            add: ["SYS_ADMIN"]
        env:
        - name: CSI_ENDPOINT
          value: unix:///csi/csi.sock
        volumeMounts:
        - name: socket-dir
          mountPath: /csi
        - name: kubelet-dir
          mountPath: /var/lib/kubelet
          mountPropagation: Bidirectional  # essential: propagates mounts to host
        - name: dev-dir
          mountPath: /dev

      volumes:
      - name: socket-dir
        hostPath:
          path: /var/lib/kubelet/plugins/csi.example.com/
          type: DirectoryOrCreate
      - name: registration-dir
        hostPath:
          path: /var/lib/kubelet/plugins_registry/
      - name: kubelet-dir
        hostPath:
          path: /var/lib/kubelet
      - name: dev-dir
        hostPath:
          path: /dev

Two things that catch people out:

mountPropagation: Bidirectional on the kubelet dir — without this, mounts made inside the node container aren’t visible to the host, so the pod never sees the volume.
privileged: true — the node driver calls mount(2), which requires root. There’s no way around it.

Error Handling and Idempotency

The CSI spec requires every RPC to be idempotent. Kubernetes retries. You will receive CreateVolume twice with the same name. You will receive ControllerPublishVolume for an already-attached volume. Handle this by checking state before acting:

// CreateVolume — return existing volume if found by name.
existing, err := cs.volClient.GetByName(ctx, name)
if err == nil {
    return buildCreateResponse(existing), nil
}

// ControllerPublishVolume — skip attach if already attached.
if attached, _ := cs.isAttached(ctx, nodeID, volumeID); attached {
    devicePath, _ := cs.getDevicePath(ctx, volumeID, nodeID)
    return &csiproto.ControllerPublishVolumeResponse{
        PublishContext: map[string]string{"DevicePath": devicePath},
    }, nil
}

Use gRPC status codes correctly:

codes.InvalidArgument  // caller's fault, missing/bad input
codes.NotFound         // resource doesn't exist
codes.AlreadyExists    // for CreateVolume when name conflicts with different params
codes.Internal         // backend/driver error
codes.Unavailable      // transient, safe to retry

Node Registration Sequence

sequenceDiagram
    participant REG as node-driver-registrar
    participant DRV as NodeService
    participant KBL as kubelet

    REG->>DRV: GetPluginInfo() → {name, version}
    REG->>DRV: GetPluginCapabilities()
    REG->>KBL: register at /var/lib/kubelet/plugins_registry/csi.example.com-reg.sock
    KBL->>REG: NodeGetInfo request
    REG->>DRV: NodeGetInfo() → {nodeID, topology}
    REG-->>KBL: {nodeID, driverSocket=/var/lib/kubelet/plugins/csi.example.com/csi.sock}
    Note over KBL: kubelet routes volume ops
for this driver to the node socket
    KBL->>DRV: NodeStageVolume / NodePublishVolume / ...

Common Pitfalls

Volume size rounding. Many backends work in GiB, not bytes. req.GetCapacityRange().GetRequiredBytes() is in bytes. Convert carefully and round up, not down — returning a smaller volume than requested violates the spec.

NodeStageVolume vs NodePublishVolume. Stage runs once per volume per node. Publish runs once per pod. If you skip staging and try to format in publish, two pods requesting the same volume will race to run mkfs on the same device.

Blocking RPCs. All CSI RPCs are synchronous from the caller’s perspective, but backends are async. CreateVolume must not return until the volume is usable. Poll with backoff; don’t sleep for a fixed duration.

xfs and duplicate UUIDs. If you restore a snapshot to create a new volume, the new volume has the same filesystem UUID as the original. xfs refuses to mount two volumes with the same UUID on the same node. Pass nouuid as a mount option for xfs volumes:

if fsType == "xfs" {
    options = append(options, "nouuid")
}

Block volumes. If you support VolumeCapability_AccessMode_SINGLE_NODE_WRITER for raw block, NodePublishVolume needs a different code path — create a file at targetPath and bind-mount the device file, not a directory.

Testing

Unit test the controller and node services with mock backends and a fake mounter:

func TestCreateVolume_Idempotent(t *testing.T) {
    ctrl := NewControllerService(fakeVolClient, fakeVMClient, ...)

    req := &csiproto.CreateVolumeRequest{
        Name:             "test-vol",
        CapacityRange:    &csiproto.CapacityRange{RequiredBytes: 16 * 1024 * 1024 * 1024},
        VolumeCapabilities: defaultCaps(),
    }

    resp1, err := ctrl.CreateVolume(ctx, req)
    require.NoError(t, err)

    resp2, err := ctrl.CreateVolume(ctx, req)
    require.NoError(t, err)

    assert.Equal(t, resp1.GetVolume().GetVolumeId(), resp2.GetVolume().GetVolumeId())
}

For the node service, k8s.io/mount-utils ships a FakeMounter that records mount calls without touching the filesystem — use it.

Integration testing the full flow requires a real Kubernetes cluster. The Kubernetes CSI sanity test suite provides a standard set of conformance tests you can run against a live driver socket.

What You End Up With

After all of this, what you have is:

A controller binary that speaks to your storage API and manages volume/snapshot lifecycle
A node binary that runs on every node, formats disks, and manages mounts
Two Kubernetes workloads (Deployment + DaemonSet) with the right sidecar containers
StorageClass, RBAC, and a driver registration object pointing at your plugin name

The CSI spec is verbose but consistent. Once you’ve implemented CreateVolume and NodeStageVolume, the pattern repeats. The hard parts are operational: idempotency under retries, async backend polling, and getting mount propagation right in the DaemonSet.

The official CSI spec and the kubernetes-csi examples repo are the two references worth keeping open while building. Note, even though the examples repo says this is not a good example, it’s actually a decent start to get the boilerplate and setup right.

What is CSI?#

Architecture Overview#

How Kubernetes Orchestrates CSI#

The Full Volume Lifecycle#

Provision → Mount#

Unmount → Delete#

Snapshots#

Volume Expansion#

Implementing the Driver in Go#

Project Structure#

The gRPC Server#

IdentityService#

ControllerService#

NodeService#

The Mounter Interface#

Deployment#

Controller Deployment#

Node DaemonSet#

Error Handling and Idempotency#

Node Registration Sequence#

Common Pitfalls#

Testing#

What You End Up With#