From DevOps to Platform Engineering: The Evolution of Infrastructure
"A personal journey transitioning from traditional DevOps to Platform Engineering. Discussing Team Topologies, Cognitive Load, Thinnest Viable Platform (TVP), and why 'You Build It, You Run It' shouldn't mean 'You Page It'."
For years, I lived in the trenches of DevOps. My days were a mix of wrangling massive on-premise datacenter clusters and managing sleek, auto-scaling fleets on Google Cloud Platform (GCP). It was exciting, but it was also exhausting.
I saw the “You Build It, You Run It” mantra turn into “You Build It, You Run It, You Secure It, You Patch It, and You Wake Up at 3 AM When the Redis Pod Evicts Itself.”
We told developers they were now empowered. In reality, we just dumped a truckload of complexity on their desks—Kubernetes YAML, Terraform state locks, IAM roles, subnet masks—and wished them luck. I wasn’t just an engineer; I was a bottleneck with a backlog of Jira tickets titled “Please fix my CI pipeline.”
This realization led me to Platform Engineering—a shift not just in tools, but in mindset. It’s about moving from being a gatekeeper to being a product owner for your internal developers.
Here are the critical concepts I’ve learned on this journey, minus the marketing fluff.
1. The Core “Why”: Cognitive Load Theory Link to heading
Why does every company suddenly have a “Platform Team”? It’s not because Kubernetes is cool (okay, maybe a little). It’s because of Cognitive Load.
In cognitive psychology—and applied to software via Team Topologies—there are three types of load:
- Intrinsic Load: The difficulty of the task itself (e.g., “How do I write this Go function?”). This is good.
- Germane Load: The effort to learn new things (e.g., “How does our business domain work?”). This is great.
- Extraneous Load: The distractions (e.g., “Why is the Jenkins agent offline?”, “What is the correct subnet for this VPC?”, “How do I rotate these AWS keys?”). This is the enemy.
In the early DevOps days, we maxed out developers’ Extraneous Load. We forced them to become part-time SysAdmins.
Platform Engineering exists to massacre Extraneous Load. We abstract the boring, repetitive, dangerous stuff so developers can focus on business logic (Intrinsic) and domain learning (Germane).
2. Team Topologies: Humans Over Tools Link to heading
If you haven’t read Team Topologies by Matthew Skelton and Manuel Pais, stop reading this blog and go buy it. It defines the modern organizational structure better than I ever could.
The key interaction is between:
- Stream-aligned Teams: The product teams building features. They should be able to move fast and break things (safely).
- Platform Teams: The team building the internal product. Their goal? Reduce the cognitive load of the Stream-aligned teams.
The interaction mode should be X-as-a-Service. The Platform Team provides a service (an API, a CLI, a Portal) that the Stream-aligned team consumes.
Crucial: If the Stream-aligned team has to open a Jira ticket to get a database, that is NOT self-service. That is just “Ops with a different name.”
3. IDP vs. Portal: Stop Confusing Them Link to heading
Terminology matters. People often use Internal Developer Platform (IDP) and Internal Developer Portal interchangeably. They shouldn’t.
- The IDP (Platform): The engine. It’s the sum of all your tech—Kubernetes clusters, Terraform modules, CI/CD pipelines, IAM policies, and the glue code that holds it together. It does the heavy lifting.
- The Portal (Interface): The UI. This is Backstage, Port, or Cortex. It’s the “single pane of glass” (I hate that phrase, but it fits) where developers interact with the IDP.
Analogy time: The IDP is the kitchen in a restaurant (ovens, chefs, ingredients). The Portal is the menu and the waiter.
You can have a great IDP without a Portal (a solid CLI or API is fine). But you can’t have a useful Portal without a functioning IDP underneath. A Backstage instance that just links to Jira is useless.
4. Thinnest Viable Platform (TVP) Link to heading
There is a dangerous trap in Platform Engineering: Over-engineering.
You decide to build a platform. You spend 6 months setting up Backstage, writing custom plugins, and creating a globally distributed multi-cloud service mesh. You launch it to great fanfare.
And nobody uses it. Because all they wanted was a simpler way to restart a pod.
Start with the Thinnest Viable Platform (TVP).
- Maybe your “Platform” starts as a really good Wiki page with copy-paste commands.
- Then it evolves into a CLI tool that wraps those commands.
- Then it becomes a Terraform module they can import.
- Eventually, it becomes a full-blown self-service Portal.
Don’t build a Ferrari when your team needs a skateboard.
5. Golden Paths (Not Golden Cages) Link to heading
Developers hate being told “no.” They love being told “this way is faster.”
- Golden Cages: “You MUST use Jenkins. You MUST use Java 17. You CANNOT use Lambda.” usage is enforced by policy and rage.
- Golden Paths: “If you use our standard Go template, you get free CI/CD, automatic metrics, and 24/7 support. If you want to use Rust, go ahead, but you’re on your own for the pipeline.”
The Golden Path offers an Opinionated, Supported, and Easy way to do things. Most developers will choose the path of least resistance. It’s not about restriction; it’s about convenience.
“Make the right thing to do the easiest thing to do.”
6. Product Mindset: Managing the “Internal Startup” Link to heading
This was the hardest shift for me. In the datacenter, my “customers” were servers that needed patching. In Platform Engineering, my customers are my colleagues—and they can be tough customers.
We have to treat the platform as a Product:
- User Research: Don’t guess what hurts. Go sit with a dev team for a day. Watch them struggle with AWS console errors. That’s your roadmap.
- Marketing: You have to sell your platform. Evangelize new features in All-Hands meetings. Write release notes that people actually read.
- Metrics: How do you measure success?
- Adoption Rate: What % of new services use the Golden Path?
- DORA Metrics: Did deployment frequency go up? Did Change Failure Rate go down?
- NPS: Do developers actually like using your tools?
7. Summary: The 10x Organization Link to heading
The myth of the “10x Engineer” is dying. The new goal is the 10x Organization.
Platform Engineering is the lever that makes that possible. By removing the friction of infrastructure, we unlock the creativity of the entire engineering organization.
It’s not about hiding complexity because we think developers aren’t smart enough to handle it. They are. It’s about hiding complexity because their brain cycles are too valuable to be spent debugging YAML indentation errors.
Welcome to the future of Ops. It’s cleaner, it’s faster, and (usually) fewer pagers go off at 3 AM.