Zomma-ing towards SPIFFE: a Secure Production Identity Framework for Everyone (and their Agents)
Why Zomma assigns SPIFFE IDs to our ephemeral runtime instances the orchestrator spins up — short-lived X.509 certificates from our own trust domain make every agentic action attributable and accountable.
Some thoughts on our architecture.
I’ve been thinking a lot about ephemeral runtime instances that our Zomma agents spin up as effector/worker subtask runners. Effector? Worker? Runner? God, the names we’ve been using interchangeably, internally.
Varun and I might need to have a talk to ‘formalise’ these internal definitions. I foresee myself being a massive pain to him trying to tie up these definitions. 😟
But if you think about it, these runtime effectors carry as much responsibility as the larger orchestrator-agent: like a worker bee to her queen. All part of a larger ecosystem; a small operand that produces the intended result.
How could the queen entrust her worker bee with the larger goal?
How could we identify a roster of assigned tasks?
How could we attribute an incident to a worker bee when things go wrong?
What the hell is SPIFFE?
Anyway, I stumbled upon SPIFFE.io. To just grab the sauce from the source:
SPIFFE, the Secure Production Identity Framework for Everyone, is a set of open-source standards for securely identifying software systems in dynamic and heterogeneous environments.
Wait… By design, Zomma spins up effector instances from the orchestrator to assign subtasks, and kills them whenever the subtask is complete. Our effectors are dynamically provisioned, used, abused, and then discarded. Like single-use plastics, except these don’t leave traces when dumped.
If that’s the case, how could we conduct post-mortems on incidents?
“Those who cannot remember the past are condemned to repeat it,” said George Santayana. We’re a growing startup: building, learning, failing fast and iterating on the go. If we do not find a way to learn from our mistakes, how could we maintain the relationship of trust that our partners have placed in us?
Our solution is simple: each runtime instance will now be assigned a SPIFFE ID.
Why SPIFFE?
Why? In practice, the SPIFFE ID is the name of the instance, and the SPIFFE Verifiable Identity Document (SVID) is essentially a verifiable badge. A short-lived X.509 certificate that’s cryptographically signed by the CA within our Trust Domain (i.e., us, zommalabs.com).
With an SVID, we can log agentic actions and make actions attributable to an accountable instance. The worker bee dies after a sting operation, but the work she touched will be there for the other bees to see.
Are you over-engineering again?
Are we over-engineering? It’s too early to tell. But I’d rather build this early than having to undertake major overhauls mid-flight as we scale.
After all, the experience I’ve had with two separate system migrations at a national scale back-to-back made me realise how much time was spent trying to keep the ship steady whilst ensuring our users and relying parties remain mostly oblivious to the underlying changes.