🤖 AI-Powered Product (Chat/Agent)
Build AI chatbots, document analyzers, code generators, and multi-agent workflows using Vercel's AI Cloud layer. AI SDK for model abstraction, AI Gateway for routing, Fluid Compute for cost-efficient streaming, and Workflows for durable agents.
Why Vercel for AI Apps?
🔄 Provider Agnostic
AI SDK lets you swap OpenAI → Anthropic → Google in one line. No rewrite, no business logic changes.
💰 98% Cost Reduction
Fluid Compute bills only active CPU. A 30s stream with 200ms CPU = 200ms cost. Transformative for AI.
🛡️ Durable Agents
use-workflow provides checkpointed steps. Multi-step agents survive function timeouts with automatic resume.
🌍 Global Low Latency
Edge Network + regional functions ensure fast first-token time for users worldwide.
Architecture Layers
Chat UI (Next.js + useChat)
useChat() hook for streaming message rendering, input handling, and message history. React Server Components for fast shell load.
Streaming API (Vercel Functions)
streamText() with AI SDK sends LLM tokens progressively. Fluid Compute bills only active CPU — 80-90% savings for streaming.
AI Gateway
Route to 100+ models with failover. OpenAI primary, Anthropic fallback. Rate limiting, cost tracking per model/user, prompt caching.
Tool Calling & Agents
AI SDK 6 structured tool calling. Multi-step agents with use-workflow for durable orchestration that survives function timeouts.
Vector Database (RAG)
Pinecone, Qdrant, or Weaviate for semantic search over documents. Retrieval-Augmented Generation for grounded responses.
Sandbox (Code Execution)
Vercel Sandbox for isolated JavaScript/TypeScript execution. AI agents can safely run generated code.
Streaming Chat — Code Example
Server: API Route
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
export async function POST(request) {
const { messages } = await request.json();
const result = streamText({
model: openai('gpt-4o'),
messages,
system: 'You are a helpful assistant.',
});
return result.toDataStreamResponse();
}
// With Fluid Compute:
// 10s stream, 200ms CPU = billed 200msClient: Chat UI
'use client';
import { useChat } from '@ai-sdk/react';
export default function Chat() {
const { messages, input,
handleInputChange, handleSubmit
} = useChat();
return (
<div>
{messages.map(m => (
<div key={m.id}>
{m.role}: {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input}
onChange={handleInputChange} />
</form>
</div>
);
}💰 Fluid Compute Cost Impact
| Scenario | Traditional | Fluid Compute | Savings |
|---|---|---|---|
| 30s LLM streaming | 30,000ms billed | 200ms billed | 99.3% |
| 5s RAG query | 5,000ms billed | 400ms billed | 92% |
| Multi-step agent (3 LLM calls) | 45,000ms billed | 600ms billed | 98.7% |
Customer Spotlight: SERHANT.
Real estate tech company SERHANT. orchestrates OpenAI + Claude + Gemini per task type using AI Gateway. Property descriptions use GPT-4o. Market analysis uses Claude. Visual processing uses Gemini.
AI Gateway routes each task to the optimal model automatically, with fallback if any provider has an outage. Cost tracking per model gives their finance team full visibility into AI spend.