Home kellton

Main navigation

  • Services
    • Digital Business Services
      • AI & ML
        • Agentic AI Platform
        • Rapid Customer Verification
        • NeuralForge
        • Utilitarian AI
        • Predictive Analytics
        • Generative AI
        • Machine Learning
        • Data Science
        • RPA
      • Digital Experience
        • Product Strategy & Consulting
        • Product Design
        • Product Management
      • Product Engineering
        • Digital Application Development
        • Mobile Engineering
        • IoT & Wearables Solutions
        • Quality Engineering
      • Data & Analytics
        • Data Consulting
        • Data Engineering
        • Data Migration & Modernization
        • Analytics Services
        • Integration & API
      • Cloud Engineering
        • Cloud Consulting
        • Cloud Migration
        • Cloud Managed Services
        • DevSecOps
      • NextGen Services
        • Blockchain
        • Web3
        • Metaverse
        • Digital Signage Solutions
    • SAP Hide
      • ServiceNow
        • AI Solutions
        • Implementation Services
        • Optimization Services
        • Consulting Services
      • SAP
        • S/4HANA Implementations
        • SAP AMS Support
        • SAP Automation
        • SAP Security & GRC
        • SAP Value Added Solutions
        • Other SAP Implementations
      • View All Services
  • Platforms & Products
    • Audit.io
    • Kai SDLC 360
    • Tasks.io
    • Optima
    • tHRive
    • Kellton4Health
    • Kellton4Commerce
    • KLGAME
    • Our Data Accelerators
      • Digital DataTwin
      • SmartScope
      • DataLift
      • SchemaLift
      • Reconcile360
    • View All Products
  • Industries
    • Fintech, Banking, Financial Services & Insurance
    • Retail, E-Commerce & Distribution
    • Pharma, Healthcare & Life Sciences
    • Non-Profit, Government & Education
    • Travel, Logistics & Hospitality
    • HiTech, SaaS, ISV & Communications
    • Manufacturing
    • Oil,Gas & Mining
    • Energy & Utilities
    • View All Industries
  • Our Partners
    • AWS
    • Microsoft
    • ServiceNow
    • View All Partners
  • Insights
    • Blogs
    • Brochures
    • Success Stories
    • News / Announcements
    • Webinars
    • White Papers
  • Careers
    • Life At Kellton
    • Jobs
  • About
    • About Us
    • Our Leadership
    • Testimonials
    • Analyst Recognitions
    • Investors
    • Corporate Sustainability
    • Privacy-Policy
    • Contact Us
    • Our Delivery Centers
      • India Delivery Center
      • Europe Delivery Center
Search
  1. Home
  2. All Insights
  3. Blogs

Stop Choosing Sides: An Engineering Leader's Framework for Build, Buy, and Hybrid AI Agents in 2026

Agentic AI
April 10 , 2026
Posted By:
Amit Shrivastav
linkedin
15 min read
 Build, Buy, and Hybrid AI Agents

Other recent blogs

LLM Cost Optimization
LLM Cost Optimization: Reduce API Burn, Scale AIOps
April 10 , 2026
digital transformation ecommerce blueprint
The Strategy Blueprint: E-commerce Digital Transformation
April 09 , 2026
Moving Beyond App Modernization- Enterprise AI Integration
Moving Beyond App Modernization: Enterprise AI Integration
March 13 , 2026

Let's talk

Reach out, we'd love to hear from you!

Image CAPTCHA
Enter the characters shown in the image.
Get new captcha!

This Blog Written Using AI but Curated by Amit Srivastava - Director – Product Management

Summary: 47% of enterprises already run a hybrid AI agent model — combining off-the-shelf tools with custom development. Most are doing it accidentally. Here's how to do it deliberately.

"2025 was meant to be the year agents transformed the enterprise, but the hype turned out to be mostly premature. It wasn't a failure of effort. It was a failure of approach." - Kate Jensen, Head of Americas, Anthropic · TechCrunch, February 2026

Jensen's diagnosis is precise, and it matters that she made it in February 2026 — twelve months after the agent deployment wave crested. The teams that struggled in 2025 weren't short on ambition or resources. They were short on a coherent architecture for deciding what to build, what to buy, and how to govern the seam between the two.

The consequences of getting that decision wrong are not hypothetical. Gartner projects that more than 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear value, and inadequate risk controls.1 The same firms that generated a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025 are now facing hard questions from CFOs about what any of it is actually delivering.

This piece is not a framework for deciding whether to pursue AI agents. That decision is largely made: Gartner separately forecasts that 40% of enterprise applications will incorporate AI agents by end of 2026, up from less than 5% today.3 The question is not if, but how to architect the decision intelligently.

47%40%+1,445%
of enterprises already run a hybrid build + buy modelof agentic AI projects forecast to be canceled by end 2027surge in multi-agent system inquiries, Q1 2024–Q2 2025
Anthropic State of AI Agents, 2026Gartner, Jun 2025Gartner, Dec 2025

The false binary that's costing you time and money

The dominant framing in every vendor deck and analyst report structures the decision as a binary: buy a platform-native agent (Salesforce Agentforce, Microsoft Copilot, ServiceNow AI) or build custom via APIs and open-source orchestration frameworks like LangGraph or AutoGen. Consulting firms have built entire practices around helping enterprises resolve this choice.

The Anthropic 2026 State of AI Agents report reveals that framing as empirically obsolete. The plurality of enterprises — 47% — are already combining off-the-shelf agents with custom-built ones. Only 21% rely entirely on pre-built agents; 20% are fully custom via APIs or open-source; the remainder are in various stages of transition.

The market has already voted for hybrid. The problem is that almost no one entered that state deliberately. They arrived there by accident: a vendor bought for one use case, a custom build launched for another, and now the two are running in parallel with no shared observability, no governance model for the seam between them, and no principled framework for which future capabilities belong where.

The goal of this piece is to give engineering leaders the architecture for converting accidental hybrid into deliberate hybrid — with a layer-by-layer decision framework, a data readiness gate, and a governance model for what happens at the seam.

Why pure-build fails at enterprise scale

Building everything custom is the position most attractive to engineering-led organizations, and the one most likely to produce a Gartner cancellation statistic. The failure mode is not technical capability — it's the three compounding traps that emerge between proof-of-concept and production.

Trap 1: The AI skills debt spiral

Building and maintaining production-grade AI agents requires a stack of capabilities that most enterprise engineering teams do not have in steady-state: prompt engineers who understand evaluation and regression testing, ML platform engineers who can build and operate inference infrastructure, and reliability engineers with experience in non-deterministic failure modes. The first custom agent typically ships on borrowed talent. The second and third require either significant hiring or the uncomfortable acknowledgment that the build velocity is unsustainable.

Trap 2: MLOps debt accumulation

The pattern that should worry engineering leaders most is the one that arrives silently. A team builds a custom agent that performs well in testing — reliable tool calls, clean outputs, low latency. Three months into production, support tickets start arriving: the agent is hallucinating in edge cases, producing contradictory outputs for similar queries, or failing silently when its context window fills up with tool call responses the team didn't account for in their capacity model. By the time this surfaces, the fix requires rearchitecting the memory management layer.

Custom agent infrastructure accretes technical debt faster than traditional software. Organizations that build their own orchestration layer instead of adopting an existing framework often discover — six to twelve months post-launch — that a disproportionate share of their AI engineering capacity is consumed by infrastructure maintenance: model versioning conflicts, context window edge cases, tool call logging gaps, fallback chain failures. None of this work surfaces as features. The engineering time it consumes is real; the competitive advantage it generates is not.

Trap 3: The undifferentiated infrastructure trap

The most insidious failure mode. Organizations pour engineering effort into building capabilities — document parsing, web browsing, code execution — that are already commoditized in the market. The insight that should govern all custom build decisions: only build what gives you a durable competitive advantage. If your competitors can buy the same capability for $50/month per seat, you should probably buy it too and allocate your engineers to the 10% of your agent architecture that actually encodes your differentiation.

Why pure-buy fails at enterprise scale

The buy-everything position is more defensible in initial economics, and more dangerous in three-year strategy. The failure modes here are structural, not technical.

Failure modeWhy it compounds over time
Agent washing: The 2024–25 market saw dozens of existing SaaS products relabeled as "AI agents" with minimal underlying capability change. Gartner research identified only approximately 130 genuine agentic AI vendors out of thousands claiming the label.6 We consistently see this in enterprise evaluations: vendor products described as "agentic" that, on technical review, execute a fixed multi-step workflow with no planning layer, no dynamic tool selection, and no state persistence across sessions. The product is a chatbot with an API. Purchasing decisions made on the basis of vendor demos frequently encounter a production experience that is closer to a dressed-up chatbot.Vendor roadmap becomes your capability ceiling. When the agent cannot do what your process requires, you either adapt your process to the tool or you build around it — both options erode the initial ROI case.
Vendor lock-in at the orchestration layer: Platform-native agents (Salesforce, ServiceNow, Microsoft) deliver high initial velocity within their ecosystem. Cross-system orchestration — the case that generates most enterprise value — requires either expensive integration work or accepting that your agents cannot coordinate across your full stack.As multi-agent architectures become table stakes, organizations locked into a single vendor's orchestration model face rebuild costs that were not in the original business case.
Data sensitivity constraints: Many enterprise workflows involve data that cannot traverse a vendor's inference infrastructure due to regulatory requirements (GDPR, HIPAA, SOC 2 commitments) or contractual confidentiality obligations. Pre-built agents that require cloud-side processing create compliance exposure that procurement teams discover after, not before, deployment.The compliance remediation path for a deployed agent that is processing data it shouldn't be is expensive and slow. Prevention requires capability mapping before vendor selection.
Customization ceiling: Pre-built agents are optimized for the modal enterprise use case. Organizations with non-standard processes, proprietary data models, or domain-specific reasoning requirements will hit the customization ceiling — the point at which no amount of prompt configuration or workflow configuration can make the agent behave the way the process requires.Discovering the customization ceiling after a 12-month deployment produces the worst possible outcome: switching costs are now embedded, and the build alternative that was available at the start of the project is now a rebuild.

The five-layer decision framework

The insight that resolves the build/buy binary is architectural decomposition. An AI agent is not a monolithic thing — it is a stack of five distinct layers, each with its own differentiation economics, and each warranting a separate build/buy analysis.

What follows is Kellton's layer-by-layer framework. It emerged from observing how enterprises have adopted previous infrastructure technologies — cloud migration, microservices decomposition, API platformiszation — and applying those adoption patterns to the specific economics of AI agent architecture. In our experience, the recurring failure pattern is not at any single layer but at the junction between layers 2 and 4, where orchestration assumptions collide with domain logic requirements. For each layer, the verdict reflects the economics most enterprises will encounter. Your specific situation — data sensitivity, engineering capacity, competitive context — may shift any individual recommendation.

LayerWhat it is & why it mattersDefault verdict
1 · Foundation modelThe underlying LLM (GPT-4o, Claude, Gemini, Llama). This layer determines reasoning quality, context window, cost per token, and data residency options. Fine-tuning is increasingly rare; prompt engineering and RAG handle most customization needs.Buy via API
2 · OrchestrationThe framework managing agent execution: task decomposition, tool routing, multi-agent coordination, retry logic, and state management. Options range from LangGraph and AutoGen (open-source) to vendor-native (Salesforce Agentforce runtime). This is where lock-in risk is highest.Hybrid open-source + config
3 · Tool integrationsThe connectors exposing external systems (CRM, ERP, databases, APIs, web) to the agent. Generic integrations (Salesforce, Jira, Slack) are commoditized. Custom integrations — proprietary internal systems, legacy databases — require build effort proportional to integration complexity.Hybrid buy standard, build custom
4 · Domain logicThe business rules, decision heuristics, and domain knowledge encoded into the agent's behavior: underwriting criteria, compliance checks, pricing logic, escalation thresholds. This is your differentiation. This is almost always a build — it is the layer competitors cannot replicate by buying the same vendor.Build own your most
5 · ObservabilityLogging, tracing, evaluation, and monitoring for agent behavior. This includes latency tracking, tool call audits, output quality scoring, and anomaly detection. Mature platforms (LangSmith, Weights & Biases, custom dashboards) exist. Building from scratch here is rarely justified.Buy via platform

The framework in one sentence: Buy your commodities, hybridize your connective tissue, and build only what encodes a durable competitive advantage. In most enterprise deployments, layer 4 — domain logic — is the only layer where building is consistently justified.

The data readiness gate

Before any build or buy commitment, there is a prior question that most organizations skip: is your data infrastructure ready to support an AI agent at production quality? The majority of agentic AI project failures that Gartner identifies trace back to this omission — teams discover data problems six months into deployment, not six weeks before launch.

The following checklist is not exhaustive, but completing it before committing to architecture will eliminate the most common failure modes. Each item maps to a class of production incident we have observed in enterprise deployments.

Data readiness gate — complete before architecture commitment

  1. Data lineage is documented for all agent-accessible sources. The agent must be able to reason about data provenance. Undocumented sources create hallucination risk and compliance exposure.
  2. Data classification is complete for all workflows the agent will touch. PII, PHI, and contractually confidential data must be identified before tool integration design — not after vendor selection.
  3. Retrieval quality has been benchmarked, not assumed. RAG-based agents are only as good as their retrieval pipeline. In our experience, teams that skip this step and build agent logic on top of an untested retrieval layer discover precision problems in production that require rearchitecting the pipeline under time pressure. Test precision and recall against representative queries before building agent logic on top.
  4. Data freshness requirements are mapped to agent decision types. An agent making time-sensitive operational decisions (inventory, pricing, routing) has different freshness requirements than one doing analytical summarization. Mismatches produce silent errors, not loud failures.
  5. An evaluation dataset exists for the target use case. You cannot assess agent quality without a ground truth dataset. Building one before deployment is non-optional for production readiness.
  6. Data access controls have been reviewed for the agent's identity. Agents act with the permissions of whatever identity they run as. Ensure least-privilege access is enforced — agents should not have broader data access than the task requires.

Governing the hybrid: the seam is where projects fail

Hybrid AI agent architectures do not fail at the build layer or the buy layer in isolation. They fail at the seam — the interface between custom-built and off-the-shelf components. Governing that seam requires explicit decisions in three areas that most teams leave implicit.

Orchestration governanceObservability governanceTeam topology
Define a single orchestration authorityUnified trace context across all agentsName a seam owner
In a hybrid architecture, every agent — whether custom-built or vendor-provided — must register with a single orchestration layer. This is typically an open-source framework (LangGraph, AutoGen) or a custom orchestration service. Vendor-native orchestration that cannot be subordinated to this layer is an integration liability. Establish this constraint before vendor evaluation, not after.In a hybrid system, a single user request may traverse both a custom agent and a vendor agent. If these agents emit traces to different observability systems, debugging a production incident requires stitching together logs from two or more platforms — a process that doubles incident response time in our experience. Require all agents to propagate a shared trace context. For vendor agents, this may require wrapping the vendor's API in a thin observability proxy.The team that built the custom agent does not own the vendor agent, and the vendor agent is not owned by anyone internal. This organizational gap is the most reliable predictor of operational incidents that persist. Name an explicit seam owner — typically a platform engineering team — with responsibility for integration health, shared observability, and vendor relationship escalation. Without a named owner, the seam is ungoverned by default.

The decision scorecard: three axes, one hybrid zone

For each capability you are evaluating — now and as new use cases emerge — score it against three axes. The combination determines whether it belongs in the buy, build, or hybrid zone of your architecture.

Capability typeUniquenessData sensitivityRecommended
Generic workflow automation (email triage, meeting summaries, ticket routing)Low — identical across competitorsLow–MediumBuy
Standard system integrations (Salesforce, Jira, ServiceNow connectors)Low — commoditized connectorsMediumBuy
Cross-system orchestration (multi-agent coordination across owned and vendor agents)Medium — architecture is proprietary, tooling is notMediumHybrid
Proprietary data retrieval and reasoning (internal knowledge, historical records)High — data is unique, retrieval logic is uniqueHighHybrid
Domain logic and decision rules (underwriting, pricing, compliance, clinical protocols)High — this is your productHighBuild
Regulated data workflows (HIPAA, GDPR-sensitive, contractually confidential processing)VariesCritical — cannot leave boundaryBuild

The hybrid zone is not a compromise — it is a deliberate architectural position. Capabilities in the hybrid zone typically use open-source or vendor orchestration frameworks but run on infrastructure you control, with domain-specific configuration and prompting that encodes your proprietary knowledge. The vendor provides the chassis; you provide the engine.

What 2026 asks of engineering leaders specifically?

The strategic question for CTOs and VPs of Engineering is not which agent platform to choose. It is how to build an organizational capability for ongoing hybrid architecture decisions as the agent market continues to evolve — and it will evolve faster in 2026 and 2027 than it did in 2024 and 2025.

The practical work of becoming an organization that can execute on hybrid AI agents is not primarily technical. It is architectural and organizational. Engineering leaders who succeed in 2026 and 2027 will be the ones who built two things alongside their agents: a repeatable decision process for where new capabilities belong in the stack, and a platform function that owns the seam — the integration layer between what you built and what you bought — with the same rigour they apply to production infrastructure.

The Gartner cancellation wave is not going to claim the enterprises that found the best vendor or built the cleverest custom system. It will claim the ones who accumulated technical and governance debt at the seam, accrued shadow AI spend outside any architectural review, and discovered their vendor's customization ceiling twelve months after the decision was irreversible.

You now have the framework to avoid that trajectory. The five layers give you a decision surface. The data readiness gate gives you a pre-commitment discipline. The governance model gives you the seam. What you do with it is an execution problem — and execution problems are solvable.

A note on Kellton's AI practice: The framework described in this piece is vendor-neutral by design — the layer decomposition and governance model apply regardless of which stack you are running. For organizations where the orchestration and tool integration layer is a bottleneck, Kellton's AI practice has built production hybrid architectures across financial services, healthcare, and logistics environments. KAI, Kellton's enterprise-grade Agentic AI platform launched in 2025, is designed to accelerate work at the orchestration and integration layers while preserving build flexibility where it matters most.

The deliberate hybrid is a choice, not a destination

Forty-seven percent of enterprises are already hybrid. The question is whether that state was arrived at through deliberate architectural decisions or through accumulated vendor purchases and one-off custom builds that are now running alongside each other without shared governance.

The failure Kate Jensen described at Anthropic was not a failure of technology. It was a failure of approach. The approach, it turns out, is architecture.

Kellton's AI practice runs layer decomposition workshops for enterprise engineering teams — typically a half-day engagement that produces a documented architecture decision and a seam governance plan. We do not recommend vendors; we help you decide which layers to own. 

Start the conversation

Want to know more?

LLM Cost Optimization
Blog
LLM Cost Optimization: Reduce API Burn, Scale AIOps
April 10 , 2026
Claude Cowork: Navigating the Shift in Indian IT Service Models
Blog
Claude Cowork: Navigating the Shift in Indian IT Service Models
February 20 , 2026
agentic ai in enterprises
Blog
Enterprise Agentic AI Architecture: 2026 Strategy and Stack Guide
February 03 , 2026

North America: +1.844.469.8900

Asia: +91.124.469.8900

Europe: +44.203.807.6911

Email: ask@kellton.com

Footer menu right

  • Services
  • Platforms & Products
  • Industries
  • Insights

Footer Menu Left

  • About
  • News
  • Careers
  • Contact
LinkedIn Twitter Youtube Facebook
Recognized as a leader in Zinnov Zones Digital Engineering and ER&D services
Kellton: 'Product Challenger' in 2023 ISG Provider Lens™ SAP Ecosystem
Recognized as a 'Challenger' in Avasant's SAP S/4HANA services
Footer bottom row seperator

© 2026 Kellton