6. Fluid Compute — The 2025 Game Changer
Fluid Compute is Vercel's biggest architectural evolution since the platform launched. Announced February 2025, enabled by default for new projects from April 23, 2025.
The problem with traditional serverless
Traditional serverless uses a microVM per function invocation. When a function is waiting for a database query or external API call (I/O), the microVM sits idle — but you're still paying for it.
For AI applications, this is especially painful: a streaming LLM response can take 10–60 seconds, but the actual CPU computation is only a fraction of that. Under traditional pricing, you pay for all 60 seconds.
What Fluid Compute does differently
1. Shared instances (not microVM-per-invocation)
Multiple invocations can share the same physical instance concurrently. Think of it as "mini-servers" rather than single-use functions. This eliminates the cold start overhead of spinning up a new microVM for each request.
2. Active CPU pricing
You only pay for the milliseconds your code actually executes on the CPU — not for I/O wait time.
Traditional: Billed for entire request duration (10s of LLM streaming = 10s billing)
Fluid: Billed only for active CPU (10s streaming, 200ms CPU = 200ms billing)
3. Bytecode caching
V8 bytecode is cached across invocations in production, eliminating parse/compile overhead on repeated executions. Only available in production (not preview deployments).
4. Error isolation
Unhandled errors in one concurrent request do not crash other requests sharing the same instance.
5. Multi-region failover (Enterprise)
First failover to another availability zone in the same region. If the entire region is down, automatically routes to the next closest region.
Cost impact example (from Vercel docs)
Function with 4GB memory in São Paulo:
- CPU rate: $0.221/hour
- Memory rate: $0.0183/GB-hour
- Request: 4 seconds active CPU, 10 seconds instance lifetime
Active CPU cost: (4s / 3600) × $0.221 = $0.000245
Memory cost: (10s / 3600) × 4GB × $0.0183 = $0.000203
Total per request: ~$0.000448
When to recommend Fluid Compute
- AI applications with long I/O (LLM calls, streaming)
- APIs with high concurrency and variable workload
- Any application where cold starts are causing user experience issues
- I/O-heavy backends (database queries, third-party API aggregation)
Fluid Compute limitations
- Shared instance means global state leaks between requests if you write to module-level variables — warn customers about this
- Not available for preview deployments (bytecode caching is production-only)
- Memory is billed for the entire instance lifetime, not just CPU time — important for long-running instances