The Root Cause of "Just Automate It"
A ROOT CAUSE series post — where we dig into the decisions, the transitions, and the truth behind the hype.
You’ve heard it a thousand times.
Just automate it.
On a conference stage. In a Slack thread. From your VP who read a blog post over the weekend. From a LinkedIn influencer who automated their “entire workflow” in a 90-second video that conveniently skips the part where it actually has to work on Monday.
And you nod. Because in theory, they’re right. Automation is good. Automation saves time. Automation reduces human error.
And yet…
You’re sitting there at 11pm on a Tuesday, debugging an automation that was supposed to save you four hours a week but has instead consumed your last three sprints. The Terraform module that “just works” doesn’t account for the seven edge cases your infrastructure accumulated over four years of organic growth. The CI/CD pipeline that was “fully automated” still has that one manual approval step because nobody trusts it to deploy to production without a human looking at it first — and nobody asks why they don’t trust it. That manual gate isn’t safety. It’s a symptom. It means the automation was never finished — but everyone pretends it was.
So let’s root cause this.
The narrative
The tech industry sells automation as a binary. You’re either automated or you’re not. Modern or legacy. DevOps or “doing it wrong.”
Every tool vendor, every conference talk, every thought leader frames it the same way: here’s a problem, here’s the automation, problem solved. Next slide.
The implication is clear: if you haven’t automated it yet, you’re behind. You’re slow. You’re the bottleneck. You are the thing that needs to be automated away.
The reality
Here’s what fifteen years of building and operating systems actually taught me:
Automation doesn’t remove complexity. It moves it.
That manual runbook your team has been using for three years? It’s ugly. It requires tribal knowledge. New people hate it. But it works because a human reads the situation, makes a judgment call, and adapts when something unexpected happens.
When you automate that runbook, you don’t eliminate those judgment calls. You encode your assumptions about what those judgment calls should be. And assumptions age. Badly. The script that restarts the service assumes the database is on the same host — because it was, when someone wrote it two years ago. The failover automation assumes a single-region setup. The alerting threshold was tuned for traffic patterns that shifted three quarters ago. Every hardcoded decision in your automation is a snapshot of a reality that no longer exists.
The infrastructure changes. The edge cases multiply. The person who wrote the automation leaves the company. And now instead of a manual process that a human can adapt in real time, you have a black box that does exactly what it was told to do eighteen months ago — which is increasingly not what you need it to do today.
Nobody talks about this part. The automation maintenance tax. The ongoing cost of keeping automated systems aligned with a reality that keeps shifting underneath them.
Enter AI: “Just automate it” on steroids
And now we have a new version of the same pitch. Louder. Shinier. With a lot more venture capital behind it.
“Just use AI for it.”
“Let the agent handle it.”
“Why are your engineers still doing this manually?”
GenAI didn’t invent the “just automate it” mindset. It turbocharged it. Because now the promise isn’t just “write a script to handle the happy path.” The promise is “the AI understands your intent, adapts to context, and figures out the edge cases for you.”
Except it doesn’t. Not really. Not yet. And maybe not in the way you think.
Here’s what actually happens when teams adopt AI-powered automation in 2025-2026:
The copilot phase: An engineer uses an AI coding assistant. Productivity goes up. Genuinely. The easy parts get easier. Boilerplate disappears. First drafts happen faster. This is real and I’m not going to pretend otherwise.
The confidence phase: Leadership sees the productivity gains and extrapolates. “If AI can write code this fast, why do we need as many engineers?” “If we can generate infrastructure-as-code with a prompt, why does provisioning take a sprint?” The LinkedIn posts start. The 90-second demos multiply.
The “and yet” phase: The AI-generated Terraform works — until it doesn’t account for your organization’s specific networking setup that evolved over four years. The AI-written code passes tests — tests that were also AI-generated and don’t cover the failure modes that only someone who’s been paged at 3am would think to test for. The agent that “handles incidents autonomously” escalates correctly 80% of the time, which sounds great until you realize the other 20% includes the incidents that actually matter.
Same pattern. Higher stakes. Because with traditional automation, at least you could read the script. You could trace the logic. You could understand why it did what it did. With an LLM-powered agent, you’re trusting a system that can’t explain its own reasoning to make decisions in your production environment. The black box just got blacker.
Agentic AI: The automation that automates itself
This is where it gets genuinely interesting — and genuinely dangerous.
The agentic AI pitch is the ultimate version of “just automate it.” Not just AI that responds to prompts, but AI that plans, executes, iterates, and chains actions together autonomously. An agent that doesn’t just write the code but also creates the PR, responds to review comments, deploys it, monitors the rollout, and rolls back if something goes wrong.
On a conference stage, this sounds like the future.
In your production environment on a Friday afternoon, this sounds like a different kind of nightmare.
Because every lesson we learned about traditional automation applies here — multiplied:
Automation doesn’t remove complexity, it moves it. Agentic AI moves it further than ever — into a system that makes decisions you didn’t explicitly program, based on patterns you can’t fully inspect, with confidence levels you can’t easily verify.
The maintenance tax compounds. When your bash script breaks, you read it and fix it. When your AI agent starts making subtly wrong decisions — deploying to the wrong environment, miscategorizing incidents, generating plausible-but-incorrect runbooks — how do you even detect that? Let alone debug it?
The understanding gap widens. This is the one that keeps me up at night. If your team automates a process with a script, they had to understand the process to write the script. If an AI agent automates a process by observing patterns in your data, nobody had to understand it. The knowledge that used to live in your team’s heads now lives nowhere accessible. And when the agent gets it wrong — who root causes the root cause tool?
Here’s the question nobody in the “agentic AI for DevOps/SRE” space wants to answer honestly: can you operate what you don’t understand?
We’ve spent twenty years in this industry arguing that developers should understand their systems end-to-end. That you should be on call for what you build. That observability matters because you need to understand what’s happening in production, not just react to it.
And now the pitch is: hand that understanding to an agent.
The real root cause hasn’t changed
I’m not an AI doomer. I use AI tools every day. Some of them are genuinely good. The coding assistants save me real time on real work. Some of the agentic workflows I’ve seen are impressive.
But here’s what I notice: the AI tools that work best for me are the ones I use after I already understand the problem. The ones that accelerate my existing knowledge, not the ones that replace it.
The AI tools that fail — for me and for every team I’ve talked to — are the ones deployed to skip the understanding.
“We don’t need to understand the legacy system, the AI will figure it out.”
“We don’t need to train juniors on incident response, the agent handles tier-1.”
“We don’t need to invest in documentation, the AI can read the code.”
That’s not a new failure mode. That’s “just automate it” wearing a different hat.
The root cause is still the same: we want to skip the understanding and jump to the solution. GenAI just made that temptation irresistible — because for the first time, the demo actually looks like it works.
The line nobody draws
Here’s where the nuance lives — and where most of the AI conversation falls apart.
There are two fundamentally different things AI can do for your team:
1. AI that replaces understanding. “The agent investigated the incident, here’s the fix, apply it.” You wake up, the problem is gone, you have no idea what happened or why. The agent was your on-call engineer, your diagnostician, and your decision-maker. You were just the human who clicked “approve.”
2. AI that accelerates understanding. “Here’s what changed in the last hour across these 14 services, here’s the correlation between this deploy and that latency spike, here are the three logs that matter out of the 200,000 that don’t.” You still investigate. You still decide. You still understand. But you got to understanding in 8 minutes instead of 45.
These sound similar. They are not.
The first one is “just automate it” for incidents. It optimizes for resolution time. The metric goes down, everyone celebrates, and six months later your team has no idea how their own systems fail because they’ve never had to figure it out themselves. Your mean time to resolve looks great. Your mean time to understand is infinite.
The second one is a force multiplier for the thing that actually matters: a human building a mental model of what went wrong and why. The AI does the grunt work — correlating signals across distributed systems, cutting through noise, surfacing what’s relevant. But the understanding stays with the human. The judgment stays with the human. The learning stays with the human.
That’s the line. And almost nobody in the AI-for-ops space draws it clearly, because “we help your team understand faster” is a harder sell than “we fix your incidents while you sleep.”
Think about it in the context of on-call. The engineer at 3am doesn’t need something to take the problem away from them. They need something that helps them see what’s happening so they can fix it — and know how to prevent it next time. An AI that makes the engineer faster at understanding is fundamentally different from an AI that makes the engineer unnecessary.
And here’s the irony: the second kind — the one that accelerates understanding — is the one that actually feels like magic. Not magic as in “the problem disappeared and I don’t know how.” That’s not magic, that’s anxiety with a bow on it. Real magic is when you open one screen at 3am and immediately see the correlation between the deploy 12 minutes ago and the latency spike in the payment service, with the three log lines that matter out of the 200,000 that don’t. You understood in seconds what would normally take 45 minutes of clicking through tabs and building queries.
That feeling — clarity arriving without the usual pain — that’s magic. And it’s the opposite of a black box. The product didn’t hide the complexity from you. It dissolved the friction between you and the understanding that was always there, buried under noise.
The best AI in operations doesn’t remove the human from the loop. It shrinks the loop so the human can think instead of dig.
Why we keep falling for it
The root cause isn’t technical. It’s emotional.
Manual work feels embarrassing. In an industry that worships efficiency and scale, admitting that your team still does something by hand feels like admitting failure. Like you’re not good enough. Not modern enough.
So we automate things we shouldn’t. We automate before we understand. We automate to signal competence rather than to solve problems.
The root cause of most automation projects isn’t “this is manual and needs to be automated.” It’s one of these:
“I’m tired of being paged at 3am” — which is an on-call culture problem, not an automation problem
“This is beneath me” — which is an ego problem
“We need to show progress” — which is a planning problem
“Everyone else has automated this” — which is a comparison problem
“Our new VP asked why this isn’t automated” — which is a political problem
None of those root causes are solved by the automation itself.
The part nobody puts in the blog post
Here’s what “just automate it” actually looks like in practice:
Week 1: Excitement. Proof of concept works. Demo goes great.
Week 4: Edge cases. The happy path is automated. The twelve other paths are not. Arguments about scope.
Week 8: The automation handles 80% of cases. The remaining 20% are harder than the original manual process because now you have to figure out when the automation should have worked but didn’t.
Week 12: Someone suggests “just adding a manual override for the edge cases.” You are now maintaining two systems.
Month 6: The person who built it is on a different team. The automation breaks in a way nobody expected. Three people spend a day reading code they didn’t write to understand decisions they weren’t part of.
Year 2: The automation is now itself legacy. Someone proposes automating the automation. The cycle repeats.
I’m not against automation. I’ve built automation I’m proud of. But the best automation I ever built came after I deeply understood the manual process, after I understood why it was manual in the first place, and after I was honest about whether automation was solving the actual problem or just making me feel better about it.
The question worth asking
Before you automate something, try this:
Instead of “how do we automate this?” ask “what is the actual cost of not automating this?”
Not the theoretical cost. Not the “at scale” cost. The actual, current, measurable cost.
If the answer is “it takes someone 20 minutes once a month,” maybe the root cause of your frustration isn’t the manual process. Maybe it’s that your team is stretched too thin and every 20-minute task feels like a crisis. That’s a staffing problem. Automation won’t fix it — it’ll just move the stress somewhere else.
If the answer is “it’s error-prone and has caused three incidents this quarter,” now we’re talking. But even then — is the root cause the manual step, or is it that the process was poorly designed? Automating a bad process gives you a bad process that runs faster.
Let’s root cause this
The tech industry has a pattern: take a genuinely useful practice, strip away all the nuance, package it as an absolute, and sell it as the answer.
Agile became “just do standups.” DevOps became “just use Kubernetes.” Automation became “just automate it.” And now AI is becoming “just let the agent do it.”
Each cycle, the promise gets bigger and the understanding gap gets wider. A bash script you don’t maintain is a nuisance. An AI agent you don’t understand is a liability — one that sounds confident while it’s wrong.
The root cause is always the same: we want simple answers to complex problems. We want to skip the understanding and jump to the solution. We want the five-minute LinkedIn video, not the six-month learning curve. And now we want the AI to do the understanding for us, so we never have to do it at all.
But the people who’ve been in the trenches long enough know: the understanding is the solution. Everything else — the scripts, the pipelines, the copilots, the agents — is only as good as the understanding behind it.
Automate what you understand. Use AI to accelerate what you already know. But the moment you’re automating to avoid understanding? That’s not engineering. That’s debt. And unlike the technical kind, this debt compounds in ways nobody has a dashboard for yet.


