--- id: architecture-trends related: - technical-foundations - platforms-enterprise - cybersecurity-ai-threats - platforms-consumer-developer key_findings: - "Chat-to-orchestrator evolution is the defining architectural shift of 2025-2026" - "MCP is winning as the universal agent-tool protocol — Linux Foundation governance, all major platforms adopted" - "57% of organizations have agents in production (LangChain Dec 2025) — this is operational, not experimental" - "The gap between shipped and announced remains the primary filter for separating real from hype" --- # Architectural Trends: Orchestrators & Agents (2024–2026) **Research date:** March 22, 2026 **Scope:** The shift from point solutions to orchestrated/agent-based AI systems across the full stack — platforms, desktop agents, reasoning models, frameworks, user layer, and infrastructure. --- ## Table of Contents 1. [Orchestration-First Platforms](#1-orchestration-first-platforms) 2. [Desktop and OS-Level Agents](#2-desktop-and-os-level-agents) 3. [Long-Reasoning / "Thinking" Models](#3-long-reasoning--thinking-models) 4. [Agentic Frameworks and Platforms](#4-agentic-frameworks-and-platforms) 5. [Impact on the User-Visible Layer](#5-impact-on-the-user-visible-layer) 6. [Impact on the Infrastructure Layer](#6-impact-on-the-infrastructure-layer) 7. [Summary: What Is Shipping vs. Announced vs. Vaporware](#7-summary-what-is-shipping-vs-announced-vs-vaporware) --- ## 1. Orchestration-First Platforms ### What "Orchestrator" Means Architecturally An orchestrator in the AI context is a system that: - **Routes** incoming tasks to appropriate models or tools - **Calls tools/functions** on behalf of the model (web search, code execution, APIs, file I/O) - **Coordinates multi-model or multi-agent pipelines** (handoffs between specialized models) - **Maintains memory and state** across a session or across multiple asynchronous turns - **Controls agent loops**: decides when to iterate, when to call a sub-agent, when to return to the user The architectural shift is from _chat completion as the terminal output_ → _chat completion as one node inside a larger task graph_. --- ### Perplexity: From Search to General-Purpose Worker Perplexity launched **Perplexity Computer** on February 25, 2026, representing the most complete public articulation of the orchestrator-as-platform model. ([Perplexity Hub](https://www.perplexity.ai/hub/blog/introducing-perplexity-computer)) **Architecture:** - Core reasoning engine + a **multi-model routing layer** that dynamically assigns sub-tasks to the most capable model for each job - As of launch, the stack routes to: Gemini for deep research (spawning sub-agents), a proprietary image model ("Nano Banana"), Veo 3.1 for video, Grok for speed in lightweight tasks, and GPT-5.2 for long-context recall - Sub-agents run in **isolated compute environments** with access to a real filesystem, real browser, and tool integrations - Designed for **asynchronous, long-running tasks** (hours to months) — user can run dozens of Computers in parallel **Physical hardware companion:** Perplexity also sells a **Mac Mini M4-based Personal Computer** (~March 2026), bundling the agent software with local execution. The device adds a local layer (filesystem, calendar, installed apps) while routing heavy reasoning to Perplexity's cloud. Computer use (clicking, typing, navigating apps) is included but noted as imperfect for unstructured UIs. ([MindStudio analysis](https://www.mindstudio.ai/blog/what-is-perplexity-personal-computer-mac-mini-agent)) **Enterprise:** Computer for Enterprise was announced at the Ask 2026 developer conference (March 10, 2026), adding Slack integration and enterprise governance. ([VentureBeat](https://venturebeat.com/technology/perplexity-takes-its-computer-ai-agent-into-the-enterprise-taking-aim-at)) **Status: Shipping** (Consumer + Enterprise) --- ### ChatGPT: Plugins → GPTs → ChatGPT Agent The evolution arc is the clearest in the industry: | Era | Product | Architecture | Status | |---|---|---|---| | 2023 | Plugins | Third-party API calls from chat context | Retired | | 2023–2025 | Custom GPTs (GPT Store) | Bundled instructions + tools, no autonomous execution | Shipping | | Jan 2025 | Operator | Standalone browser-using agent; Computer-Using Agent (CUA) model | Retired as standalone | | Jul 2025 | ChatGPT Agent (Agent Mode) | Unified: Operator + Deep Research + ChatGPT in one model | **Shipping** | | Oct 2025 | Apps SDK / AgentKit | Developer toolkit for building apps that run inside ChatGPT chat | Shipping | | Mar 2026 | Agentic Commerce Protocol + Instant Checkout | AI agents completing purchases on behalf of users (Etsy live, Shopify coming) | **Shipping** | **Key architectural shift with ChatGPT Agent (July 2025):** OpenAI merged two previously separate systems — Operator's visual web interaction and Deep Research's synthesis — into a single unified model that dynamically chooses between visual navigation and text-based browsing. ([OpenAI announcement](https://openai.com/index/introducing-chatgpt-agent/)) **Operator launch (January 2025):** Powered by the Computer-Using Agent (CUA) model — GPT-4o vision + reinforcement learning on GUI interaction. Set records on WebArena and WebVoyager browser-use benchmarks. ([OpenAI](https://openai.com/index/introducing-operator/)) **GPT-5 note:** o3 is described as "succeeded by GPT-5" in the OpenAI model docs, indicating the reasoning-vs-generalist model distinction is collapsing. ([OpenAI API docs](https://developers.openai.com/api/docs/models/o3)) **Status: Shipping** (ChatGPT Pro/Plus/Team) --- ### Claude / Anthropic: MCP as Protocol Standard Anthropic's strategy has two interlocked components: **1. Model Context Protocol (MCP)** — Open-sourced November 2024. MCP is a standardized two-way protocol for connecting AI applications to external data sources and tools. ([Anthropic announcement](https://www.anthropic.com/news/model-context-protocol)) Architecture: - **Host application** (e.g., Claude Desktop, VS Code/Cursor) embeds an **MCP client** - MCP client connects to one or more **MCP servers** (each exposing a specific capability: GitHub, Google Drive, Slack, Postgres, filesystem, etc.) - Communication over WebSocket + JSON-RPC 2.0 (stateful, low-latency, two-way) - Sampling capability: MCP servers can request LLM completions _back_ from the client, enabling essentially agentic server-side workflows **Ecosystem breadth (March 2026):** MCP is supported across Claude, ChatGPT, VS Code, Cursor, and many others. ([modelcontextprotocol.io](https://modelcontextprotocol.io/docs/getting-started/intro)). Microsoft added **native MCP support to Windows** in public preview (Ignite 2025). ([Microsoft TechCommunity](https://techcommunity.microsoft.com/blog/windows-itpro-blog/evolving-windows-new-copilot-and-ai-experiences-at-ignite-2025/4469466)) MCP has become the _de facto_ integration protocol standard — the "USB-C for AI," as described in the official docs. **2. Claude Desktop + Computer Use** — Computer use (screenshot capture, mouse/keyboard control, desktop automation) is in beta via API. Available through `computer-use-2025-11-24` beta flag. Claude Desktop supports one-click MCP extension installation ("Desktop Extensions"). ([Anthropic API docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool), [Skywork analysis](https://skywork.ai/blog/ai-agent/claude-desktop-productivity-features/)) **Status:** - MCP: **Shipping**, widely adopted - Claude Desktop MCP extensions: **Shipping** - Computer use: **Beta** (API-only today; no first-party Claude UI exposes it directly to consumer users without developer setup) --- ### Google Gemini: Extensions → Gemini Agent → Gemini 3 **Gemini Agent** (currently available): Multi-step task execution combining live web browsing, Deep Research, Google app integration (Calendar, Gmail, etc.), and Canvas for content creation. ([Google Gemini Agent page](https://gemini.google/overview/agent/)) **Gemini 3 Pro Preview** (November 2025): Explicitly designed as "the most powerful agentic model... as the core orchestrator for advanced workflows." Key new architectural primitives: ([Google Developers Blog](https://developers.googleblog.com/ko/building-ai-agents-with-google-gemini-3-and-open-source-frameworks/)) - `thinking_level` parameter: per-request reasoning depth control (high/medium/low) - **Thought Signatures**: encrypted representations of the model's internal reasoning state that can be passed back in conversation history — allows the agent to retain its exact chain-of-thought across multi-step tool calls without reasoning drift - `media_resolution` parameter for multimodal fidelity control - Native integration with LangChain, AI SDK, and other open-source frameworks **Gemini Code Assist Agent Mode** (2025): Autonomous multi-step coding across files; migrated from deprecated Tool Calling API to MCP in October 2025. ([Digital Applied](https://www.digitalapplied.com/blog/google-gemini-code-assist-agent-mode-guide)) **Status: Shipping** (Gemini Agent for consumers; Gemini 3 for developers) --- ## 2. Desktop and OS-Level Agents ### Claude Desktop + Computer Use **What it can do today (shipped):** - MCP integrations with one-click install for filesystem, GitHub, Slack, Google Drive, Postgres, etc. - File creation and editing (documents, spreadsheets, slide decks, PDFs) natively in chat - Computer use via API: screenshot → mouse/keyboard → screenshot loop for GUI automation **Architectural pattern:** The "agent loop" is explicit in Anthropic's docs — it's a program that handles communication between Claude and the execution environment, sending actions to the environment and returning screenshots/command outputs. Max iteration limits prevent runaway loops. ([Anthropic computer use docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool)) **Claude Code / Claude Agent SDK:** Announced September 2025 — gives developers access to the same tools, context management, and permissions frameworks that power Claude Code (the coding agent). SDK supports sub-agents and hooks. ([Anthropic](https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously)) **Limitations today:** Computer use is API-only; reliability degrades on non-deterministic or complex UIs; no native "Claude Desktop sees your screen by default" consumer product yet. --- ### Microsoft Copilot: Deep OS Integration Microsoft has gone furthest in integrating AI agents at the OS level, with a multi-layer architecture: **Windows 11 + Copilot (Ignite 2025):** - **"Ask Copilot" on the taskbar**: entry point to agents, apps, files, settings; launch/monitor/manage agents via `@` mentions - **Native MCP support** in Windows (public preview): standardized way for agents to connect with local apps and tools - **Agent connectors** for File Explorer and Windows Settings (built-in) - **Agent workspace** (private preview): contained, policy-controlled environment where agents operate in parallel sessions without disrupting the user - **Windows 365 for Agents**: secure Cloud PCs for running enterprise-grade agents; Manus AI, Fellou, GenSpark, Simular, TinyFish are early users ([Microsoft 365 Blog](https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-ignite-2025-copilot-and-agents-built-to-power-the-frontier-firm/)) **Microsoft 365 Copilot (Ignite 2025):** - Agent Mode in Word (GA), Excel, PowerPoint (Frontier program) - Excel Agent Mode supports **model choice** between Anthropic and OpenAI reasoning models — the first Microsoft product to expose cross-vendor model routing to end users - **Work IQ**: intelligence layer giving agents access to organizational context (people, docs, permissions) - **Agent 365**: unified control plane for enterprise agents — governance, policy management, monitoring, compliance ([Microsoft TechCommunity](https://techcommunity.microsoft.com/blog/windows-itpro-blog/evolving-windows-new-copilot-and-ai-experiences-at-ignite-2025/4469466), [Microsoft Copilot Blog](https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-microsoft-copilot-studio-november-2025/)) **Copilot Studio (2025 Wave 1):** Low-code agent builder; integration with 1,400+ systems via MCP, Power Platform connectors, and Microsoft Graph; computer use automation for web tasks; enterprise governance via Microsoft Purview and Sentinel. ([Microsoft Learn](https://learn.microsoft.com/en-us/power-platform/release-plan/2025wave1/microsoft-copilot-studio/)) **Status:** Agent Mode in Office: **Shipping** (Word GA, Excel/PPT via Frontier). Windows taskbar agent management: **Public preview**. Agent workspace: **Private preview**. Windows 365 for Agents: **Public preview**. --- ### Apple Intelligence Apple's approach is the most cautious and privacy-centric: **What is shipping today (iOS 18/macOS Sequoia):** - Writing tools (rewrite, proofread, summarize) across system apps - Clean Up in Photos (generative fill) - Smart Reply and thread summarization in Mail/Messages - Priority notifications - Siri improvements with onscreen awareness (understanding context of what's on screen) **Planned / in development:** - Siri's "personal context" capabilities (add this to contacts, book this appointment) — flagged as "in development" on Apple's own page ([Apple Intelligence](https://www.apple.com/apple-intelligence/)) - Tim Cook signaling **Visual Intelligence** as the defining feature of upcoming wearable AI devices (Bloomberg, February 2026) ([Bloomberg](https://www.bloomberg.com/news/newsletters/2026-02-22/apple-s-ai-wearables-push-what-to-expect-from-march-4-low-end-macbook-launch)) - iOS 26 referenced in reporting as continuing the rollout **Architectural distinction from competitors:** Apple's Private Cloud Compute model routes tasks to on-device models first; only escalates to server-side models (running on Apple Silicon) for complex requests. No data is logged on Apple's servers. This is a deliberate architectural choice to differentiate on privacy, at the cost of capability breadth vs. cloud-first competitors. **Apple does not use "MCP" or expose an agent API to third-party developers in the same way.** App Intents is the current mechanism for Siri integration; how this evolves into a broader agent layer remains announced-but-not-shipped. **Status:** Basic writing/photo features **Shipping**. Agentic Siri with cross-app actions: **Partial shipping** (basic) / **In development** (full cross-app execution). --- ### Anthropic MCP as Protocol Standard MCP deserves special treatment as an architectural layer, not just a product: **Protocol design:** - **MCP servers** expose tools (functions), resources (data), and prompts to MCP clients - **MCP clients** are embedded in host applications (Claude Desktop, VS Code, ChatGPT, etc.) - Transport: STDIO for local, HTTP+SSE for remote - **Sampling** (added post-launch): servers can request LLM completions from clients, enabling server-side agentic behavior **Adoption trajectory:** - Launched November 2024 (Anthropic open-source) - By March 2025: adopted by multiple AI clients beyond Claude - Windows native MCP support announced November 2025 - March 2026: supported across Claude, ChatGPT, VS Code, Cursor, and described as the universal integration standard ([modelcontextprotocol.io](https://modelcontextprotocol.io/docs/getting-started/intro)) **Why MCP matters architecturally:** It decouples tool implementation from model implementation. A tool author writes one MCP server; it works with any MCP-compatible AI client. This is the "USB-C" moment for AI integrations — eliminating the N×M integration problem. --- ## 3. Long-Reasoning / "Thinking" Models ### The Core Shift Before reasoning models, complex multi-step tasks required **agent chains**: a sequence of separate LLM calls, each handling one step, with orchestration code managing state and routing between calls. The brittleness was real: any step could fail, context got lost between calls, and debugging chains was painful. Reasoning models internalize the "chain" inside a single inference call. Instead of externalizing the reasoning steps into code, the model reasons through them as **thinking tokens** before producing the final output. --- ### OpenAI o1 / o3 / GPT-5 **o1 (September 2024):** First widely deployed reasoning model; introduced the "think longer = better performance" scaling property. **o3 (April 16, 2025):** Significant step change. Key specs: - SWE-bench Verified: **71.7%** (vs. 48.9% for o1) — real-world software engineering on GitHub issues - Codeforces Elo: **2727** (vs. 1891 for o1) - GPQA Diamond (expert science): **87.7%** - For the first time in the o-series, **agentically uses all ChatGPT tools**: web search, Python code execution, visual reasoning, image generation — trained to reason about when and how to use tools ([OpenAI](https://openai.com/index/introducing-o3-and-o4-mini/)) - Available in Chat Completions API and Responses API **GPT-5:** Successor to o3; available as of the February 2026 model deprecation announcement ([OpenAI Help Center](https://help.openai.com/fr-ca/articles/6825453-chatgpt-release-notes)). Marks the convergence of the o-series reasoning capabilities with the GPT-series conversational ability. **Impact on chains:** Tasks that previously required a 5-step agent chain (search → extract → analyze → compare → summarize) can now be handled in a single o3/GPT-5 call with tool access. The "more compute = better performance" RL scaling property means you can budget inference time instead of engineering chain logic. --- ### Claude Extended Thinking / Adaptive Thinking **Claude 3.7 Sonnet (February 2025):** First Claude model with user-accessible extended thinking. Key architecture: - `thinking: {type: "enabled", budget_tokens: N}` — explicitly set how many tokens the model can spend thinking - Users can toggle extended thinking mode on/off in Claude.ai - Transparent thought process visible to users (though more detached in tone than final output) - "Action scaling": improved capability to iteratively call functions, respond to environmental changes, continue until a task is complete ([Anthropic](https://www.anthropic.com/news/visible-extended-thinking)) **Claude Opus 4.6 (current):** Uses **adaptive thinking** (`thinking: {type: "adaptive"}`) with an effort parameter — the model self-regulates thinking depth based on task complexity. Manual budget_tokens is deprecated on Opus 4.6. Thinking blocks from previous turns are **preserved in context by default** (started in Opus 4.5), enabling cache optimization across multi-turn tool use. Supports **interleaved thinking**: the model thinks between tool calls, enabling more sophisticated reasoning about intermediate results. ([Anthropic API docs](https://platform.claude.com/docs/en/build-with-claude/extended-thinking)) --- ### Gemini 2.5 Pro / Gemini 3 **Gemini 2.5 Pro (March 25, 2025):** Google's first major thinking model release. - Dynamic thinking by default (adjusts budget based on complexity) - 1M token context window (2M planned) - SWE-bench Verified: **63.8%** with custom agent setup - GPQA Diamond and AIME 2025 SOTA without test-time majority voting - Described by Google as a model that "handles more complex problems and supports even more capable, context-aware agents" — building thinking into all future models ([Google Blog](https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-model-thinking-updates-march-2025/)) **Gemini 3 series (late 2025–2026):** Adds `thinking_level` parameter (high/medium/low/minimal) per-request. Thought Signatures for stateful multi-step tool use. Google's recommendation to developers: "Stop using complex Chain of Thought prompt engineering. Rely on the `thinking_level` parameter to handle reasoning depth natively." ([Google AI for Developers](https://ai.google.dev/gemini-api/docs/thinking), [Google Developers Blog](https://developers.googleblog.com/ko/building-ai-agents-with-google-gemini-3-and-open-source-frameworks/)) --- ### Impact on the Agent Framework Ecosystem The reasoning model shift is reshaping what agent frameworks are _for_: **Tasks that previously required agent chains but now can be single calls:** - Multi-document research synthesis (long context + reasoning replaces retrieve→summarize→merge loops) - Multi-step code generation, debugging, and refactoring (single SWE-bench call vs. Plan→Code→Test→Fix chains) - Complex analytical tasks requiring back-and-forth data processing - Many customer support triage workflows **What remains genuinely multi-agent:** - Tasks that require **parallel execution** (multiple sub-agents running simultaneously) - Tasks that require **true long-running async work** (hours/days; beyond a single context window) - Tasks that require **human-in-the-loop checkpoints** at specific decision gates - Multi-system orchestration where different systems need different auth contexts / isolation - Enterprise workflows with **governance requirements**: audit trails, policy enforcement at each step **Net effect on LangChain/CrewAI/AutoGen:** The frameworks are seeing increased adoption (LangChain 220% GitHub star growth, 300% download growth Q1 2024–Q1 2025), but the nature of what's built has shifted: fewer "chain 5 GPT-3.5 calls" patterns, more "one powerful reasoning model + durable state management + human-in-the-loop controls." LangGraph in particular has seen its role shift from "orchestrate many small calls" to "manage state, branching, and recovery for long-running agents." ([Info Services LangChain analysis](https://www.infoservices.com/blogs/artificial-intelligence/langchain-multi-agent-ai-framework-2025)) --- ## 4. Agentic Frameworks and Platforms ### Developer Frameworks #### LangChain + LangGraph **LangChain 1.0** (October 22, 2025): Stable release after 2+ years of iteration. Key changes: - `create_agent` abstraction: fastest path to a tool-calling agent with any model provider - Standard content blocks: provider-agnostic spec for model outputs - Built on LangGraph runtime - Adopters include Uber, LinkedIn, Klarna, Rippling ([LangChain Blog](https://blog.langchain.com/langchain-langgraph-1dot0/)) **LangGraph 1.0** (October 22, 2025): First stable major release in the "durable agent framework" space. - Graph-based execution model with explicit state management - Production features: persistence, observability, human-in-the-loop - Designed for long-running agents, branching business logic, complex workflows requiring oversight - **When to use LangChain vs. LangGraph:** LangChain for fast prototyping with standard patterns; LangGraph for fine-grained control, durable state, complex branching. LangChain agents are built on LangGraph, so you can migrate incrementally. **Survey data (December 2025 LangChain State of Agent Engineering report, n=1,300+):** - 57.3% of organizations have agents in production (up from 51% in 2024) - 30.4% actively developing with concrete plans to deploy - 67% of 10k+ employee orgs have agents in production - Top use cases: Customer service (26.5%), Research & data analysis (24.4%), Internal workflow automation (18%) - #1 barrier: Quality of outputs (32%) - 75%+ of respondents use multiple models; 60% of AI developers working on agents use LangChain as primary orchestration layer - Observability: 89% have implemented it — now considered table stakes ([LangChain State of Agent Engineering](https://www.langchain.com/state-of-agent-engineering)) #### CrewAI Role-based multi-agent orchestration. Optimized for **deterministic multi-agent workflows** where different agents play defined roles (researcher, writer, reviewer, etc.). Self-described as achieving 30% reduction in development time for such workflows. Python-first, open source core. ([Sparkco AI comparison](https://sparkco.ai/blog/crewai-vs-autogen-multi-agent-orchestration-2025)) #### AutoGen / AG2 Microsoft Research's framework, now maintained as AG2. Emphasis on **dynamic task allocation** and heterogeneous agent environments. Better for iterative reasoning with human approval gates and environments requiring dynamic scaling. 20% decrease in operational costs vs. alternatives in dynamic scenarios. ([Sider.ai comparison](https://sider.ai/blog/ai-tools/crewai-vs-autogen-which-multi-agent-framework-wins-in-2025)) #### OpenAI Agents SDK Released March 11, 2025, replacing the experimental Swarm SDK. Production-ready upgrade with: - Agents as first-class primitives (LLMs + instructions + tools) - **Handoffs**: agents delegating to other agents for specific tasks - **Guardrails**: validation of agent inputs and outputs - Built-in tracing and observability - Works with OpenAI models and any Chat Completions-style API (including third-party) - Intended to replace the Assistants API (deprecated target: mid-2026) ([OpenAI](https://openai.com/index/new-tools-for-building-agents/), [InfoQ](https://www.infoq.com/news/2025/03/openai-responses-api-agents-sdk/)) **Responses API** (March 2025, GA): Replaces Chat Completions for agentic use cases. Supports built-in tools (web search, file search, code interpreter, computer use), reasoning summaries, preserved reasoning tokens around function calls. ([OpenAI Agents SDK docs](https://openai.github.io/openai-agents-python/)) #### Framework Comparison | Framework | Model | Best For | Coordination | Production Fit | |---|---|---|---|---| | LangGraph | Open source | Complex branching, durable state | Graph state machine | Strong (used at Uber, LinkedIn, Klarna) | | LangChain | Open source | Fast prototyping, standard patterns | High-level abstractions | Strong, built on LangGraph | | CrewAI | Open source + commercial | Role-based multi-agent workflows | Role & task orchestration | Strong for deterministic flows | | AutoGen/AG2 | Open source | Dynamic, heterogeneous environments | Conversation-based | Strong for R&D/iterative tasks | | OpenAI Agents SDK | Open source | OpenAI-stack native, cross-provider | Handoffs + guardrails | Strong, production-ready | --- ### Low-Code Agent Builders **n8n** (workflow-based): Added 70+ AI nodes in 2025. 450+ integrations, 1,000+ community workflows, supports all major models. Unique in combining traditional automation with AI agents in one platform. Self-hostable. ([n8n Blog](https://blog.n8n.io/best-ai-agent-builders/)) **Flowise** (LLM-native, open source): Visual drag-and-drop builder specifically for LLM applications. **Acquired by Workday in August 2025**. Self-hostable. ([The Vibe Marketer](https://www.thevibemarketer.com/guides/ai-agent-builders-2025)) **Dify**: Rapid AI app prototyping with minimal setup; pre-configured AI features, built-in publishing. Cloud and self-hosted. **Key trend:** Flowise's acquisition by Workday signals enterprise interest in absorbing the low-code agent builder layer directly. --- ### Enterprise Agent Platforms #### Salesforce Agentforce **Timeline:** - **September 2024 (Dreamforce):** Agentforce 1.0 announced; GA October 25, 2024. $2/conversation pricing. Atlas Reasoning Engine. - **December 17, 2024:** Agentforce 2.0 — pre-built skills for Slack, Tableau, CRM; Headless AI Agents; advanced NLP; enterprise security/governance - **March 2025 (TrailblazerDX):** Agentforce 2dx — proactive/autonomous operation "behind the scenes"; multimodal (text/voice/vision); AgentExchange marketplace; deeper Data Cloud integration - **October 2025 (Dreamforce 2025):** Agentforce 360 GA — the company's "full platform designed to connect humans and AI agents in one trusted system"; Agentforce Voice; Agent Script for hybrid deterministic+LLM workflows **Metrics:** 5,000+ deals closed in first 6 months; $900M+ AI and Data Cloud revenue; 10,000+ agents created at Dreamforce 2024 in 3 days; deployments at Indeed, OpenTable, Formula 1, Heathrow, Finnair ([Cyntexa](https://cyntexa.com/blog/agentforce-statistics-and-trends/), [Salesforce investor release](https://investor.salesforce.com/news/news-details/2025/Welcome-to-the-Agentic-Enterprise-With-Agentforce-360-Salesforce-Elevates-Human-Potential-in-the-Age-of-AI/default.aspx)) **Agent Builder:** Low-code; defines agent topics, natural language instructions, and a library of actions. Integrates Flows, Apex, MuleSoft APIs. ([Salesforce](https://www.salesforce.com/agentforce/)) **Status: Shipping** (all above releases are GA) #### ServiceNow ServiceNow's approach positions AI agents as an "operational layer inside the enterprise" rather than chat assistants: - **AI Agent Orchestrator**: coordinates specialized agents across systems and workflows - **AI Agent Studio**: create agents, define execution plans, set triggers, test outcomes - **Yokohama release:** AI Agents and Agent Studio - **Zurich release:** Agentic playbooks woven into tasks and workflows; Build Agent for AI-powered app development - 36% of global AI "Pacesetters" already using agentic AI; 43% considering adoption in next 12 months ([ServiceNow](https://www.servicenow.com/workflow/it-transformation/whats-next-ai-2026.html), [Winklix analysis](https://www.winklix.com/blog/how-servicenow-ai-agents-are-transforming-enterprise-workflows-in-2026/)) **Status: Shipping** #### IBM, UiPath, and Legacy RPA Players **RPA vs. Agentic AI — The Core Distinction:** | Dimension | RPA | Agentic AI | |---|---|---| | Instruction model | Explicit script; deterministic rule-based | Goal-oriented; reasons and adapts | | Handles unstructured data | No | Yes (via LLM) | | Can make decisions | No | Yes | | Handles exceptions | Requires human intervention | Can reason through exceptions | | Learns/adapts | No | Yes (within session; increasingly across sessions) | | Best for | High-volume, stable, predictable processes | Complex, exception-heavy, dynamic workflows | **The convergence pattern (2025-2026):** Enterprise automation increasingly uses RPA as the *execution layer* (for UI-level actions on legacy systems without APIs) while AI agents handle the *planning and decision layer*. UiPath added AI Agent capabilities; IBM watsonx positioned as the AI reasoning layer on top of RPA. ([Thomson Reuters](https://tax.thomsonreuters.com/blog/ai-agents-versus-rpa-a-guide-for-accountants-tri/), [IBM Community](https://community.ibm.com/community/user/blogs/ahmed-alsareti/2025/11/04/rpa-vs-agentic-ai-transforming-enterprise-automati/)) The key differentiator from traditional RPA: **AI agents automate outcomes; RPA automates tasks.** ([RT Insights](https://www.rtinsights.com/rpa-vs-ai-automation-is-robotic-process-automation-being-replaced/)) --- ## 5. Impact on the User-Visible Layer ### Are Front-Ends Consolidating? The data says: **not a direct replacement, but a power shift**. ChatGPT reached **700 million weekly active users as of August 2025** — this puts it in the same tier as major social platforms. ([Forbes Tech Council](https://www.forbes.com/councils/forbestechcouncil/2025/10/31/will-chatgpt-replace-your-favorite-apps-why-it-depends-on-ui-not-ai/)) That scale makes the chat interface competitive with dedicated apps for _many_ tasks. **What is actually happening:** 1. **AI assistants are capturing the "discovery" and "initiation" layer**: Users increasingly start in ChatGPT/Claude/Gemini rather than going directly to a specialized app. The AI then either completes the task itself or routes to the specialized service. 2. **Specialized apps become "invisible APIs"**: Etsy, Shopify merchants, OpenTable, Instacart are integrating with ChatGPT's Agentic Commerce Protocol (shipping March 2026). The brand experience happens inside ChatGPT. This is the "unnoticed infrastructure" risk — apps become API endpoints fed into someone else's UI. ([OpenAI](https://help.openai.com/fr-ca/articles/6825453-chatgpt-release-notes)) 3. **Vertical apps are not dying**: Tasks requiring governance, regulatory compliance, complex domain-specific UX (booking 50 employees into flights, processing insurance claims with audit trails) still require dedicated platforms. "AI excels at the first step, but specialized platforms prevail in the subsequent actions." ([Forbes Tech Council](https://www.forbes.com/councils/forbestechcouncil/2025/10/31/will-chatgpt-replace-your-favorite-apps-why-it-depends-on-ui-not-ai/)) ### "One Assistant" vs. "Best of Breed" The debate is real but the market is settling into a **tiered coexistence**: - **Tier 1 — Universal assistants** (ChatGPT, Claude, Gemini, Perplexity Computer): Handle broad tasks; orchestrate other tools; best UX for unstructured / open-ended work - **Tier 2 — Enterprise orchestrators** (Salesforce Agentforce, ServiceNow, Microsoft 365 Copilot): Deep integration with CRM/ERP/IT systems; governance, audit, compliance built-in; best for structured business workflows - **Tier 3 — Single-purpose apps**: Survive where trust, domain expertise, UX specificity, or regulation create genuine moats (legal, medical, financial, CAD, etc.) The number of _installed apps_ is not decreasing — but the number of _workflows that start in a specialized app_ is declining as users delegate the "start" to an assistant. The risk for app makers: losing the initial intent capture → losing upsell opportunities → becoming commodity infrastructure. --- ## 6. Impact on the Infrastructure Layer ### Model Routing: The Rise of Multi-Model Stacks **OpenRouter 2025 State of AI** (100 trillion token empirical study): Key findings relevant to architecture: - **No single model dominates**: Proprietary vs. OSS is now roughly 70/30, with OSS steadily gaining (was negligible in 2023) - **Routing by task type**: Anthropic Claude → programming/technology (>80% share in that category); DeepSeek/Qwen → roleplay and high-volume low-cost tasks; Google → translation/science - **Tool calling adoption trending sharply upward throughout 2025**: First concentrated on GPT-4o-mini and Claude 3.5/3.7, diversifying to Claude 4.5 Sonnet, Grok, GLM 4.5 by mid-2025 - **Sequence length tripling**: Average request went from ~2K to ~5.4K tokens (3× growth); programming requests 3–4× longer — driven by agentic multi-turn patterns - **Reasoning models now >50% of tokens** (from negligible in early 2025) ([OpenRouter State of AI](https://openrouter.ai/state-of-ai)) **Practical implications:** - Models without reliable tool calling formats are falling behind in enterprise adoption - The "glass slipper" effect: developers test many models but lock to the first-fit; boomerang returns (e.g., DeepSeek) show ongoing testing across providers - Infrastructure must handle long-running conversations, tool chains, and heterogeneous model stacks **AI inference market:** $106 billion in 2025, headed to $255 billion by 2030. Inference has overtaken training at 55% of cloud AI spend. Average enterprise LLM spend: $7M/company in 2025 (vs. $2.5M in 2024). ([SoftwareSeni inference market analysis](https://www.softwareseni.com/the-ai-inference-market-in-2025-hardware-consolidation-pricing-wars-and-what-it-means-for-buyers/)) --- ### Model-as-a-Service Players | Provider | Role | Status | |---|---|---| | **OpenRouter** | Model routing gateway; unified API across 200+ models; routes by capability, latency, cost | Shipping; 100T tokens analyzed in 2025 report | | **Together AI** | Full-stack inference platform: dedicated inference, fine-tuning, GPU clusters; 2× faster inference vs alternatives, 60% cost reduction | Shipping; 45% gross margin | | **Groq** | LPU-based inference chip (purpose-built for speed); 877 tokens/sec on Llama 3 8B; **acquired by NVIDIA in late 2025** | Shipping (pre-acquisition) | --- ### Tool/Function Calling as Universal Interface Function calling (also called "tool use") has emerged as the architectural primitive that makes models orchestratable. Before function calling, integrating LLMs with external systems required complex prompt engineering and output parsing. With standardized function calling: - **Schema definition**: Tools are described as JSON schemas with names, descriptions, and parameter types - **Model decides when to call**: `tool_choice: "auto"` — the model reasons about whether a function call is needed - **Universal across providers**: OpenAI, Anthropic, Google, and open-source models (Llama, Mistral) all support function calling; syntax differs slightly but patterns are isomorphic **Cross-provider abstraction (LiteLLM pattern):** A single `tools` array definition works across GPT-4, Claude, Gemini, and Llama via LiteLLM — enabling true model-agnostic agent code. ([DEV Community](https://dev.to/qvfagundes/function-calling-and-tool-use-turning-llms-into-action-taking-agents-30ca)) **Berkeley Function Calling Leaderboard (BFCL):** Emerged as the de facto standard for evaluating tool-use capability. Tests: simple function calling, parallel function calling, multiple function selection, relevance detection (knowing when NOT to call), multi-turn interactions. ([Klavis AI](https://www.klavis.ai/blog/function-calling-and-agentic-ai-in-2025-what-the-latest-benchmarks-tell-us-about-model-performance)) **MCP as the next layer up:** Where function calling is the model-level primitive, MCP is the infrastructure-level standard — allowing tools to be registered, discovered, and called across a standardized protocol without per-model implementation. The two layers are complementary: the model uses function calling to invoke tools; MCP standardizes how those tools are exposed and connected. --- ### Modularity and Hot-Swappability The production AI stack is becoming increasingly modular, enabling independent evolution of each layer: **Pattern (production multi-adapter architecture):** 1. Foundation Model (reasoning core) 2. Domain Adapters (specialized delta weights, hot-swappable at inference time) 3. Routing & Orchestration Layer (classifies queries, assigns to optimal adapter) 4. Inference & Serving Infrastructure (vLLM, dynamic batching, stateless nodes) 5. Guardrails, Validation & Observability ([Innova Solutions architecture writeup](https://innovasolutions.com/blog/building-production-grade-ai-inside-our-modular-multi-adapter-architecture/)) **Hot-swappability is real**: Adapter weights are small delta footprints; they can be updated independently without touching the base model. A domain regression (e.g., legal domain output degraded) can be fixed by retraining only the legal adapter and deploying without touching any other domain. **What is hot-swappable today vs. theoretically:** - ✅ **Model provider** (switch Claude ↔ GPT-5 ↔ Gemini behind an OpenRouter or LiteLLM facade) - ✅ **Tools/MCP servers** (add/remove without changing model) - ✅ **Domain adapters** (LoRA/fine-tune layers for enterprise deployments) - ✅ **Orchestration framework** (LangChain → LangGraph → custom, when same tool interface preserved) - ⚠️ **Memory/context store** (vector DBs are swappable but migration is complex; schema lock-in exists) - ❌ **Agent behavior/personality** (still coupled to specific model training) --- ## 7. Summary: What Is Shipping vs. Announced vs. Vaporware | Feature / Platform | Status | Notes | |---|---|---| | **Perplexity Computer** (multi-agent, multi-model) | ✅ **Shipping** | Consumer + Enterprise (March 2026) | | **Perplexity Personal Computer** (Mac Mini + agent) | ✅ **Shipping** | Hardware requires macOS; computer use imperfect for complex UIs | | **ChatGPT Agent Mode** (Operator + Deep Research unified) | ✅ **Shipping** | Pro/Plus/Team as of July 2025 | | **ChatGPT Agentic Commerce / Instant Checkout** | ✅ **Shipping** | Etsy live; Shopify (1M merchants) imminent as of March 2026 | | **OpenAI Responses API + Agents SDK** | ✅ **Shipping** | Replaced Swarm; Assistants API deprecated mid-2026 | | **Anthropic MCP (protocol + Claude Desktop)** | ✅ **Shipping** | Open protocol; broadly adopted | | **Claude Computer Use** | ⚠️ **Beta / API only** | No consumer-facing computer use without developer setup | | **Claude Opus 4.6 adaptive thinking** | ✅ **Shipping** | Interleaved thinking across tool calls | | **Gemini Agent** (multi-step, Google apps integration) | ✅ **Shipping** | Consumer | | **Gemini 3 Pro (Thought Signatures, thinking_level)** | ✅ **Shipping** | Developer API / Vertex AI | | **Google Gemini Code Assist Agent Mode** | ✅ **Shipping** | VS Code, JetBrains IDEs | | **Microsoft Copilot Agent Mode in Word** | ✅ **Shipping** | GA | | **Microsoft Copilot Agent Mode in Excel/PowerPoint** | ⚠️ **Frontier program** | Not broadly available | | **Windows native MCP support** | ⚠️ **Public preview** | | | **Windows Agent Workspace** | ⚠️ **Private preview** | | | **Apple Intelligence (writing tools, Smart Reply)** | ✅ **Shipping** | iOS 18 / macOS Sequoia | | **Apple Siri cross-app agentic actions** | ⚠️ **In development** | Explicitly flagged on Apple's page | | **LangChain 1.0 / LangGraph 1.0** | ✅ **Shipping** | October 2025 | | **OpenAI o3 / GPT-5** | ✅ **Shipping** | o3 April 2025; GPT-5 succeeds o3 | | **Salesforce Agentforce 360** | ✅ **Shipping** | October 2025 GA; 5,000+ enterprise deals | | **ServiceNow AI Agent Orchestrator** | ✅ **Shipping** | Yokohama + Zurich releases | | **Flowise (acquired by Workday)** | ✅ **Shipping** | Under Workday ownership as of August 2025 | | **OpenRouter multi-model routing** | ✅ **Shipping** | 200+ models; 100T tokens/year run-rate | --- ## Key Architectural Conclusions 1. **The chat interface is now the orchestration layer entry point**, not the terminal output. Every major platform (ChatGPT, Claude, Gemini, Perplexity) has evolved its chat UI into a task executor that routes work to tools, sub-agents, and external services. 2. **MCP is winning as the integration standard** — adopted by Claude, ChatGPT, VS Code, Cursor, Windows, and many others. It is functionally replacing bespoke API integrations for AI-to-tool connectivity. 3. **Reasoning models are collapsing agent chains** — tasks that required 5-step pipelines in 2023 are now handled in single inference calls with tool access. This raises the bar for when a framework-based multi-agent approach is actually necessary. 4. **Multi-agent is still necessary for**: parallel async execution, long-running durable workflows, genuine multi-department cross-system orchestration, and governance-heavy enterprise processes. 5. **The infrastructure layer is converging on modularity**: hot-swappable models (OpenRouter/LiteLLM), hot-swappable tools (MCP), hot-swappable domain adapters (LoRA), and vendor-agnostic function calling interfaces. 6. **Front-end is not dying but is being disintermediated at the top of the funnel**: Universal assistants are capturing task initiation; specialized apps retain the execution layer for high-trust, high-complexity, regulated workflows. Brands risk becoming invisible API endpoints. 7. **The enterprise adoption curve is ahead of the consumer narrative**: 57% of organizations have agents in production (LangChain survey, Dec 2025); Salesforce has 5,000+ Agentforce deals; ServiceNow is mainstream in IT/HR/security workflows. --- *Research compiled March 22, 2026. All claims include direct source citations. Sources reflect primary platform announcements, technical documentation, and empirical usage reports.*