Architecture Pattern

🤖 AI-Powered Product (Chat/Agent)

Build AI chatbots, document analyzers, code generators, and multi-agent workflows using Vercel's AI Cloud layer. AI SDK for model abstraction, AI Gateway for routing, Fluid Compute for cost-efficient streaming, and Workflows for durable agents.

Why Vercel for AI Apps?

🔄 Provider Agnostic

AI SDK lets you swap OpenAI → Anthropic → Google in one line. No rewrite, no business logic changes.

💰 98% Cost Reduction

Fluid Compute bills only active CPU. A 30s stream with 200ms CPU = 200ms cost. Transformative for AI.

🛡️ Durable Agents

use-workflow provides checkpointed steps. Multi-step agents survive function timeouts with automatic resume.

🌍 Global Low Latency

Edge Network + regional functions ensure fast first-token time for users worldwide.

Architecture Layers

Chat UI (Next.js + useChat)

useChat() hook for streaming message rendering, input handling, and message history. React Server Components for fast shell load.

Streaming API (Vercel Functions)

streamText() with AI SDK sends LLM tokens progressively. Fluid Compute bills only active CPU — 80-90% savings for streaming.

AI Gateway

Route to 100+ models with failover. OpenAI primary, Anthropic fallback. Rate limiting, cost tracking per model/user, prompt caching.

Tool Calling & Agents

AI SDK 6 structured tool calling. Multi-step agents with use-workflow for durable orchestration that survives function timeouts.

Vector Database (RAG)

Pinecone, Qdrant, or Weaviate for semantic search over documents. Retrieval-Augmented Generation for grounded responses.

Sandbox (Code Execution)

Vercel Sandbox for isolated JavaScript/TypeScript execution. AI agents can safely run generated code.

Streaming Chat — Code Example

Server: API Route

// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(request) {
  const { messages } = await request.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    system: 'You are a helpful assistant.',
  });

  return result.toDataStreamResponse();
}
// With Fluid Compute:
// 10s stream, 200ms CPU = billed 200ms

Client: Chat UI

'use client';
import { useChat } from '@ai-sdk/react';

export default function Chat() {
  const { messages, input, 
    handleInputChange, handleSubmit 
  } = useChat();

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} 
          onChange={handleInputChange} />
      </form>
    </div>
  );
}

💰 Fluid Compute Cost Impact

Scenario	Traditional	Fluid Compute	Savings
30s LLM streaming	30,000ms billed	200ms billed	99.3%
5s RAG query	5,000ms billed	400ms billed	92%
Multi-step agent (3 LLM calls)	45,000ms billed	600ms billed	98.7%

Customer Spotlight: SERHANT.

Real estate tech company SERHANT. orchestrates OpenAI + Claude + Gemini per task type using AI Gateway. Property descriptions use GPT-4o. Market analysis uses Claude. Visual processing uses Gemini.

AI Gateway routes each task to the optimal model automatically, with fallback if any provider has an outage. Cost tracking per model gives their finance team full visibility into AI spend.

Read on Vercel Blog →← All Architectures