Agentic AI: Why Experience Matters More Than Hype

Written by Tyler Jewell | Aug 28, 2025 6:25:00 PM

The artificial intelligence landscape is evolving at a breakneck pace, and nowhere is this more apparent than in the emerging field of agentic AI. As companies rush to build and deploy intelligent, distributed systems, the conversation is shifting from what’s possible to what’s reliable and trustworthy.

But in the midst of this shift, we’re also seeing a familiar pattern of speculative frenzy—a “bubble that knows it’s a bubble,” as Craig McCaskill aptly puts it. Valuations are soaring far beyond the fundamentals of revenue or, more critically, proven longevity.

Take LangChain. Experimentation with it for early projects has been significant. A recent report suggests a $1B+ valuation—with annual recurring revenue (ARR) estimated at less than $15 million. A 66x multiple? 🤔 Based on the history of other tectonic technology shifts, this kind of disconnect between market hype and production results should give every enterprise leader pause.

Recent community discussions about LangChain suggest many have already paused:

Early data from the broader market shows a clear need to separate hype from results:

A 2025 MIT study found that 95% of AI attempts are failing to deliver expected results.
The Rand Corporation reports that AI-related projects fail at a rate twice as high as non-AI related projects.
S&P Global found that AI projects are being abandoned at twice the rate in 2025 than they were in 2024.

This isn’t failure of imagination—it’s failure of infrastructure engineering. These statistics don’t indicate that AI is dead. Not even close. They show that most agentic projects today are experiments, not hardened systems. That’s the real gap. The hype is meaningful. The deployments are not.

Agentic systems are distributed systems—with new classes of complexity

What makes agentic AI—systems that can plan and act autonomously—attractive, is also what makes it fundamentally complex to operate at scale. These systems are not just stateless function calls or chained APIs. They are stateful, long-running, concurrent processes that act autonomously in partially observable environments.

That puts them squarely in the domain of distributed systems.

And they introduce new layers of complexity that go beyond conventional cloud architectures:

Stochastic Components. LLMs are high-latency, non-deterministic functions. They introduce probabilistic behavior into the execution path, requiring new approaches to resilience, testing, and observability.
Specialized Infrastructure. LLM inference often runs on GPUs, with different scaling, scheduling, and failure modes than CPU-bound services. Coordinating between CPU-based agents and GPU-based reasoning steps adds new resource orchestration concerns.
Long-Lived Workflows. Agentic systems aren’t request/response. Tasks may span hours, days, or weeks. They require persistent memory, resumable execution, and consistent coordination across time.
Side-Effecting Actions. Unlike pure data pipelines, agentic systems routinely take actions with irreversible effects—sending emails, updating records, provisioning infrastructure. That raises the stakes for consistency, idempotency, and auditability.

This isn’t just “microservices, but smarter.” It’s a fundamentally harder problem domain.

Agentic systems require you to reason about durability, observability, backpressure, concurrency, state propagation, and scheduling, all while considering probabilities, predictions, and real-world side effects. You don’t get this for free from an LLM. And you certainly don’t get it from a new orchestration library.

Agentic systems are distributed systems, but with failure modes and complexity classes most teams haven’t had to routinely deal with before.

We've seen this movie before

LangGraph, a LangChain product, is one of the frameworks in the agentic AI ecosystem. It allows developers to define agent workflows as graphs, combining memory, tools, branching logic, and control flow into a single execution model.

But, it’s important to understand what LangGraph really is: a durable execution engine.

That means it’s implicitly responsible for coordinating distributed state across time, machines, and failures. It must ensure that when a step is defined, it will eventually be executed, exactly once, in the right order, with the right context, even in the face of failure.

This is a fundamentally hard class of system to build. Durable execution engines are not just function orchestrators. They are distributed systems that need to account for persistence, recovery, idempotency, retries, and external side effects.

We’ve seen this exact challenge play out before:

Apache Storm delivered on low-latency processing, but failed under load due to missing backpressure controls and brittle fault recovery.
Netflix Conductor brought workflow orchestration to microservices, but struggled with retry safety, coordination semantics, and observability under scale.
Even Apache Spark, now battle-hardened, took years of operational refinement to handle the realities of distributed memory management, fault recovery, and job scheduling.

These systems didn’t fall short because they were poorly designed. They fell short because distributed systems are adversarial by nature. Edge cases happen constantly, and the only way to build confidence is by surviving them in production.

LangGraph hasn’t gone through that phase yet. It’s early, evolving, and promising, but as with all systems in this category, the abstractions are only as trustworthy as the runtime beneath them.

Durable execution is a commitment, not a feature or an add-on product. And in distributed systems, every commitment comes with a liability: to state, to failure handling, to correctness. You don’t get those guarantees by writing a YAML spec or wrapping a call in a try/except block. You get them by building infrastructure that has failed—and been fixed—at scale.

What we have learned at Akka

Over the past 15 years, Akka has helped companies build distributed systems that operate reliably at scale. The systems powered by Akka don’t just serve dashboards or call APIs. They detect fraud, manage payments, route logistics, process petabytes of data, and keep global infrastructure running. Some of our customers have achieved more than a decade of uninterrupted uptime.

That didn’t happen by chance. It came from solving real operational problems like:

Queue saturation → Prioritized, bounded mailboxes and backpressure-aware protocols
Node crashes during execution → Event-sourced actors that persist state and resume correctly
Cascading retries → Circuit breakers and supervisory hierarchies
Distributed debugging → Structured tracing, actor-based observability, and log correlation
Workflows that span time → Durable, resumable processes that maintain state across restarts and deployments
And much more…

This wasn’t theoretical. Each architectural choice came from a production incident or a scaling constraint. Real infrastructure evolves with postmortems, not design documents.

The path forward: What enterprises actually care about

When we talk to executives and engineering teams building agentic systems, their goals aren’t philosophical, they’re operational. And across industries, the priorities are consistent:

Get into production, quickly. It’s not enough to spin up a demo or run a quick start. Organizations want to ship agentic capabilities into production, and continue shipping, week after week, release after release. That means architectural consistency, developer velocity, and tooling that doesn’t fall apart under real-world pressure.
Stay there, safely. Reliability is just the baseline. Teams need consistency, observability, evaluation, compliance, and security, especially when agents are acting on sensitive data or triggering real-world side effects. It all comes down to trust.
Scale, cost-effectively. Once agentic systems are live, efficiency matters. The ability to scale horizontally, contain compute costs, minimize token consumption, and avoid operational complexity is a gating factor for production success.

These are the outcomes that define success for agentic systems in the enterprise. And they’re the reason most projects fail to cross the gap from prototype to production. Moreover, this is the basis for why experts in the trenches with enterprise customers claim, “Quite simply, Akka provides the industry’s best way to build agentic AI systems that scale in the enterprise and ensure stability, performance, and outcomes.”

The promise of agentic AI is real, but the technology to deliver it must be built on a foundation of stability, trust, and proven performance. In a hyped up market where a vendor's valuation can balloon (and evaporate) overnight, it is essential to choose a partner that has been building the future of distributed systems for more than a decade, not just riding the current wave.

View full post