A Deductive AI success story

Using agentic intelligence to accelerate RCA up to 90%

Executive summary

Founded by a team of founding engineers from Databricks and ThoughtSpot, Deductive AI is building the next generation of AI-powered troubleshooting systems for modern distributed environments. With only 10 employees, Deductive AI is already helping cloud-native organizations significantly reduce the time and complexity of root cause analysis across vast telemetry pipelines.

Using Akka, the team builds extremely large knowledge graphs (spanning millions of nodes and edges) by processing billions of time series, petabytes of logs, and hundreds of millions of lines of code changes from various systems, including Prometheus, Datadog, New Relic, and GitHub, with low latency and high resilience, enabling real-time operational intelligence.

 

Akka @ Deductive AI
Elasticity Agility Resilience
Multisource data ingestion across diverse telemetry providers and millions of log events, with integration into a live knowledge graph. A 3-person team delivering powerful root cause automation and AI workflows for medium-to-large
companies operating cloud infrastructure.
Built-in concurrency and backpressure in Akka ensure graceful failure handling and persistent knowledge modeling for reliable troubleshooting.

Akka was a natural fit for our team. We needed a high-performance concurrency framework to power our ingest and graph modeling systems, and it was tailor-made for our requirements.

Pratyush Verma
Software Engineer. Deductive AI

The challenge

While working at Meta, the Deductive AI founding team witnessed firsthand the challenges of diagnosing failures across massive, multi-region Spark workloads. Root cause analysis (RCA) at this scale required not only human expertise but also faster, more scalable automation capable of analyzing logs, metrics, and code changes with contextual awareness.

They founded Deductive AI to address this challenge by creating an reinforcement learning (RL)-based multi-agent AI system that automatically detects probable causes of system failures, ranks hypotheses, and provides actionable insights to developers in real-time. Agents collaborate in real-time, performing tasks such as runbook generation, context gathering, hypothesis generation and ranking, and online evaluations to guide the investigation effectively.

To build this solution, they required a solution capable of processing millions of events in real time and interacting with upstream sources with varying latency guarantees, failure conditions and ensuring isolation for faulty sources. The metadata includes telemetry metrics, log indexes, code changes, and any other data generated by their connectors, such as GitHub, DataDog, Prometheus, and others.

The solution

Akka is central to Deductive AI’s data ingestion layer. It powers the flow of telemetry into a Neo4j-based knowledge graph, which models system relationships, time series metrics, logs patterns, and change events. From there, a multi-agent system operates on top of the knowledge graph. This agentic system reacts to alerts or user queries by:

  • Parsing metrics, logs, and deployments to isolate anomalies

  • Asking intelligent, structured questions of telemetry providers

  • Prioritizing likely causes and gathering supporting evidence

  • Operating either autonomously or in a human-in-the-loop mode, depending on the use case


By building with Akka, Deductive AI provides key technical features, including:

  • Periodic ingestion at minute-level granularity, tuned to rate limits of various data sources in near real-time
  • Event-based coordination of agents using Akka and LangChain
  • Guardrails and checkpoints to avoid runaway token usage or infinite analysis loops
  • Support for Claude and GPT-4 with model support evolving alongside model capabilities


Akka provides unique capabilities to Deductive AI, including:

  • Elastic data ingestion: Connects to GitHub, Prometheus, Datadog, and more, feeding data into a live entity-relationship graph.
  • Resilient concurrency: Ensures telemetry pipelines don’t break under pressure and can be scaled easily across JVM nodes.
  • Developer speed: Akka fit the team’s background and enables clean, maintainable ingest code.

The results

With Akka and AI agents working together, Deductive AI is setting a new benchmark for intelligent observability and automated root cause diagnostics. Deductive AI’s deployments have delivered remarkable time savings:

  • Average 70% reduction in time-to-mitigation
  • Improved developer velocity
  • Fewer fire drills for SRE and platform teams
  • Platform extensibility to accommodate future model evolution and richer telemetry sources

     

As Pratyush notes "Even for the small number of cases where we don’t find the exact root cause, the evidence gathered by the system is so useful that it’s still a huge time-saver.”

When AI Needs an SLA