Other recent blogs

Let's talk

Reach out, we'd love to hear from you!

This Blog Written Using AI but Curated by Amit Srivastava - Director – Product Management

Summary: 47% of enterprises already run a hybrid AI agent model — combining off-the-shelf tools with custom development. Most are doing it accidentally. Here's how to do it deliberately.

"2025 was meant to be the year agents transformed the enterprise, but the hype turned out to be mostly premature. It wasn't a failure of effort. It was a failure of approach." - Kate Jensen, Head of Americas, Anthropic · TechCrunch, February 2026

Jensen's diagnosis is precise, and it matters that she made it in February 2026 — twelve months after the agent deployment wave crested. The teams that struggled in 2025 weren't short on ambition or resources. They were short on a coherent architecture for deciding what to build, what to buy, and how to govern the seam between the two.

The consequences of getting that decision wrong are not hypothetical. Gartner projects that more than 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear value, and inadequate risk controls.1 The same firms that generated a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025 are now facing hard questions from CFOs about what any of it is actually delivering.

This piece is not a framework for deciding whether to pursue AI agents. That decision is largely made: Gartner separately forecasts that 40% of enterprise applications will incorporate AI agents by end of 2026, up from less than 5% today.3 The question is not if, but how to architect the decision intelligently.

47%	40%+	1,445%
of enterprises already run a hybrid build + buy model	of agentic AI projects forecast to be canceled by end 2027	surge in multi-agent system inquiries, Q1 2024–Q2 2025
Anthropic State of AI Agents, 2026	Gartner, Jun 2025	Gartner, Dec 2025

The false binary that's costing you time and money

The dominant framing in every vendor deck and analyst report structures the decision as a binary: buy a platform-native agent (Salesforce Agentforce, Microsoft Copilot, ServiceNow AI) or build custom via APIs and open-source orchestration frameworks like LangGraph or AutoGen. Consulting firms have built entire practices around helping enterprises resolve this choice.

The Anthropic 2026 State of AI Agents report reveals that framing as empirically obsolete. The plurality of enterprises — 47% — are already combining off-the-shelf agents with custom-built ones. Only 21% rely entirely on pre-built agents; 20% are fully custom via APIs or open-source; the remainder are in various stages of transition.

The market has already voted for hybrid. The problem is that almost no one entered that state deliberately. They arrived there by accident: a vendor bought for one use case, a custom build launched for another, and now the two are running in parallel with no shared observability, no governance model for the seam between them, and no principled framework for which future capabilities belong where.

The goal of this piece is to give engineering leaders the architecture for converting accidental hybrid into deliberate hybrid — with a layer-by-layer decision framework, a data readiness gate, and a governance model for what happens at the seam.

Why pure-build fails at enterprise scale

Building everything custom is the position most attractive to engineering-led organizations, and the one most likely to produce a Gartner cancellation statistic. The failure mode is not technical capability — it's the three compounding traps that emerge between proof-of-concept and production.

Trap 1: The AI skills debt spiral

Building and maintaining production-grade AI agents requires a stack of capabilities that most enterprise engineering teams do not have in steady-state: prompt engineers who understand evaluation and regression testing, ML platform engineers who can build and operate inference infrastructure, and reliability engineers with experience in non-deterministic failure modes. The first custom agent typically ships on borrowed talent. The second and third require either significant hiring or the uncomfortable acknowledgment that the build velocity is unsustainable.

Trap 2: MLOps debt accumulation

The pattern that should worry engineering leaders most is the one that arrives silently. A team builds a custom agent that performs well in testing — reliable tool calls, clean outputs, low latency. Three months into production, support tickets start arriving: the agent is hallucinating in edge cases, producing contradictory outputs for similar queries, or failing silently when its context window fills up with tool call responses the team didn't account for in their capacity model. By the time this surfaces, the fix requires rearchitecting the memory management layer.

Custom agent infrastructure accretes technical debt faster than traditional software. Organizations that build their own orchestration layer instead of adopting an existing framework often discover — six to twelve months post-launch — that a disproportionate share of their AI engineering capacity is consumed by infrastructure maintenance: model versioning conflicts, context window edge cases, tool call logging gaps, fallback chain failures. None of this work surfaces as features. The engineering time it consumes is real; the competitive advantage it generates is not.

Trap 3: The undifferentiated infrastructure trap

The most insidious failure mode. Organizations pour engineering effort into building capabilities — document parsing, web browsing, code execution — that are already commoditized in the market. The insight that should govern all custom build decisions: only build what gives you a durable competitive advantage. If your competitors can buy the same capability for $50/month per seat, you should probably buy it too and allocate your engineers to the 10% of your agent architecture that actually encodes your differentiation.

Why pure-buy fails at enterprise scale

The buy-everything position is more defensible in initial economics, and more dangerous in three-year strategy. The failure modes here are structural, not technical.

Failure mode	Why it compounds over time
Agent washing: The 2024–25 market saw dozens of existing SaaS products relabeled as "AI agents" with minimal underlying capability change. Gartner research identified only approximately 130 genuine agentic AI vendors out of thousands claiming the label.6 We consistently see this in enterprise evaluations: vendor products described as "agentic" that, on technical review, execute a fixed multi-step workflow with no planning layer, no dynamic tool selection, and no state persistence across sessions. The product is a chatbot with an API. Purchasing decisions made on the basis of vendor demos frequently encounter a production experience that is closer to a dressed-up chatbot.	Vendor roadmap becomes your capability ceiling. When the agent cannot do what your process requires, you either adapt your process to the tool or you build around it — both options erode the initial ROI case.
Vendor lock-in at the orchestration layer: Platform-native agents (Salesforce, ServiceNow, Microsoft) deliver high initial velocity within their ecosystem. Cross-system orchestration — the case that generates most enterprise value — requires either expensive integration work or accepting that your agents cannot coordinate across your full stack.	As multi-agent architectures become table stakes, organizations locked into a single vendor's orchestration model face rebuild costs that were not in the original business case.
Data sensitivity constraints: Many enterprise workflows involve data that cannot traverse a vendor's inference infrastructure due to regulatory requirements (GDPR, HIPAA, SOC 2 commitments) or contractual confidentiality obligations. Pre-built agents that require cloud-side processing create compliance exposure that procurement teams discover after, not before, deployment.	The compliance remediation path for a deployed agent that is processing data it shouldn't be is expensive and slow. Prevention requires capability mapping before vendor selection.
Customization ceiling: Pre-built agents are optimized for the modal enterprise use case. Organizations with non-standard processes, proprietary data models, or domain-specific reasoning requirements will hit the customization ceiling — the point at which no amount of prompt configuration or workflow configuration can make the agent behave the way the process requires.	Discovering the customization ceiling after a 12-month deployment produces the worst possible outcome: switching costs are now embedded, and the build alternative that was available at the start of the project is now a rebuild.

The five-layer decision framework

The insight that resolves the build/buy binary is architectural decomposition. An AI agent is not a monolithic thing — it is a stack of five distinct layers, each with its own differentiation economics, and each warranting a separate build/buy analysis.

What follows is Kellton's layer-by-layer framework. It emerged from observing how enterprises have adopted previous infrastructure technologies — cloud migration, microservices decomposition, API platformiszation — and applying those adoption patterns to the specific economics of AI agent architecture. In our experience, the recurring failure pattern is not at any single layer but at the junction between layers 2 and 4, where orchestration assumptions collide with domain logic requirements. For each layer, the verdict reflects the economics most enterprises will encounter. Your specific situation — data sensitivity, engineering capacity, competitive context — may shift any individual recommendation.

Layer	What it is & why it matters	Default verdict
1 · Foundation model	The underlying LLM (GPT-4o, Claude, Gemini, Llama). This layer determines reasoning quality, context window, cost per token, and data residency options. Fine-tuning is increasingly rare; prompt engineering and RAG handle most customization needs.	Buy via API
2 · Orchestration	The framework managing agent execution: task decomposition, tool routing, multi-agent coordination, retry logic, and state management. Options range from LangGraph and AutoGen (open-source) to vendor-native (Salesforce Agentforce runtime). This is where lock-in risk is highest.	Hybrid open-source + config
3 · Tool integrations	The connectors exposing external systems (CRM, ERP, databases, APIs, web) to the agent. Generic integrations (Salesforce, Jira, Slack) are commoditized. Custom integrations — proprietary internal systems, legacy databases — require build effort proportional to integration complexity.	Hybrid buy standard, build custom
4 · Domain logic	The business rules, decision heuristics, and domain knowledge encoded into the agent's behavior: underwriting criteria, compliance checks, pricing logic, escalation thresholds. This is your differentiation. This is almost always a build — it is the layer competitors cannot replicate by buying the same vendor.	Build own your most
5 · Observability	Logging, tracing, evaluation, and monitoring for agent behavior. This includes latency tracking, tool call audits, output quality scoring, and anomaly detection. Mature platforms (LangSmith, Weights & Biases, custom dashboards) exist. Building from scratch here is rarely justified.	Buy via platform

The framework in one sentence: Buy your commodities, hybridize your connective tissue, and build only what encodes a durable competitive advantage. In most enterprise deployments, layer 4 — domain logic — is the only layer where building is consistently justified.

The data readiness gate

Before any build or buy commitment, there is a prior question that most organizations skip: is your data infrastructure ready to support an AI agent at production quality? The majority of agentic AI project failures that Gartner identifies trace back to this omission — teams discover data problems six months into deployment, not six weeks before launch.

The following checklist is not exhaustive, but completing it before committing to architecture will eliminate the most common failure modes. Each item maps to a class of production incident we have observed in enterprise deployments.

Data readiness gate — complete before architecture commitment

Data lineage is documented for all agent-accessible sources. The agent must be able to reason about data provenance. Undocumented sources create hallucination risk and compliance exposure.
Data classification is complete for all workflows the agent will touch. PII, PHI, and contractually confidential data must be identified before tool integration design — not after vendor selection.
Retrieval quality has been benchmarked, not assumed. RAG-based agents are only as good as their retrieval pipeline. In our experience, teams that skip this step and build agent logic on top of an untested retrieval layer discover precision problems in production that require rearchitecting the pipeline under time pressure. Test precision and recall against representative queries before building agent logic on top.
Data freshness requirements are mapped to agent decision types. An agent making time-sensitive operational decisions (inventory, pricing, routing) has different freshness requirements than one doing analytical summarization. Mismatches produce silent errors, not loud failures.
An evaluation dataset exists for the target use case. You cannot assess agent quality without a ground truth dataset. Building one before deployment is non-optional for production readiness.
Data access controls have been reviewed for the agent's identity. Agents act with the permissions of whatever identity they run as. Ensure least-privilege access is enforced — agents should not have broader data access than the task requires.

Governing the hybrid: the seam is where projects fail

Hybrid AI agent architectures do not fail at the build layer or the buy layer in isolation. They fail at the seam — the interface between custom-built and off-the-shelf components. Governing that seam requires explicit decisions in three areas that most teams leave implicit.

Orchestration governance	Observability governance	Team topology
Define a single orchestration authority	Unified trace context across all agents	Name a seam owner
In a hybrid architecture, every agent — whether custom-built or vendor-provided — must register with a single orchestration layer. This is typically an open-source framework (LangGraph, AutoGen) or a custom orchestration service. Vendor-native orchestration that cannot be subordinated to this layer is an integration liability. Establish this constraint before vendor evaluation, not after.	In a hybrid system, a single user request may traverse both a custom agent and a vendor agent. If these agents emit traces to different observability systems, debugging a production incident requires stitching together logs from two or more platforms — a process that doubles incident response time in our experience. Require all agents to propagate a shared trace context. For vendor agents, this may require wrapping the vendor's API in a thin observability proxy.	The team that built the custom agent does not own the vendor agent, and the vendor agent is not owned by anyone internal. This organizational gap is the most reliable predictor of operational incidents that persist. Name an explicit seam owner — typically a platform engineering team — with responsibility for integration health, shared observability, and vendor relationship escalation. Without a named owner, the seam is ungoverned by default.

The decision scorecard: three axes, one hybrid zone

For each capability you are evaluating — now and as new use cases emerge — score it against three axes. The combination determines whether it belongs in the buy, build, or hybrid zone of your architecture.

Capability type	Uniqueness	Data sensitivity	Recommended
Generic workflow automation (email triage, meeting summaries, ticket routing)	Low — identical across competitors	Low–Medium	Buy
Standard system integrations (Salesforce, Jira, ServiceNow connectors)	Low — commoditized connectors	Medium	Buy
Cross-system orchestration (multi-agent coordination across owned and vendor agents)	Medium — architecture is proprietary, tooling is not	Medium	Hybrid
Proprietary data retrieval and reasoning (internal knowledge, historical records)	High — data is unique, retrieval logic is unique	High	Hybrid
Domain logic and decision rules (underwriting, pricing, compliance, clinical protocols)	High — this is your product	High	Build
Regulated data workflows (HIPAA, GDPR-sensitive, contractually confidential processing)	Varies	Critical — cannot leave boundary	Build

The hybrid zone is not a compromise — it is a deliberate architectural position. Capabilities in the hybrid zone typically use open-source or vendor orchestration frameworks but run on infrastructure you control, with domain-specific configuration and prompting that encodes your proprietary knowledge. The vendor provides the chassis; you provide the engine.

What 2026 asks of engineering leaders specifically?

The strategic question for CTOs and VPs of Engineering is not which agent platform to choose. It is how to build an organizational capability for ongoing hybrid architecture decisions as the agent market continues to evolve — and it will evolve faster in 2026 and 2027 than it did in 2024 and 2025.

The practical work of becoming an organization that can execute on hybrid AI agents is not primarily technical. It is architectural and organizational. Engineering leaders who succeed in 2026 and 2027 will be the ones who built two things alongside their agents: a repeatable decision process for where new capabilities belong in the stack, and a platform function that owns the seam — the integration layer between what you built and what you bought — with the same rigour they apply to production infrastructure.

The Gartner cancellation wave is not going to claim the enterprises that found the best vendor or built the cleverest custom system. It will claim the ones who accumulated technical and governance debt at the seam, accrued shadow AI spend outside any architectural review, and discovered their vendor's customization ceiling twelve months after the decision was irreversible.

You now have the framework to avoid that trajectory. The five layers give you a decision surface. The data readiness gate gives you a pre-commitment discipline. The governance model gives you the seam. What you do with it is an execution problem — and execution problems are solvable.

A note on Kellton's AI practice: The framework described in this piece is vendor-neutral by design — the layer decomposition and governance model apply regardless of which stack you are running. For organizations where the orchestration and tool integration layer is a bottleneck, Kellton's AI practice has built production hybrid architectures across financial services, healthcare, and logistics environments. KAI, Kellton's enterprise-grade Agentic AI platform launched in 2025, is designed to accelerate work at the orchestration and integration layers while preserving build flexibility where it matters most.

The deliberate hybrid is a choice, not a destination

Forty-seven percent of enterprises are already hybrid. The question is whether that state was arrived at through deliberate architectural decisions or through accumulated vendor purchases and one-off custom builds that are now running alongside each other without shared governance.

The failure Kate Jensen described at Anthropic was not a failure of technology. It was a failure of approach. The approach, it turns out, is architecture.

Kellton's AI practice runs layer decomposition workshops for enterprise engineering teams — typically a half-day engagement that produces a documented architecture decision and a seam governance plan. We do not recommend vendors; we help you decide which layers to own.

Start the conversation