← Study Guide·🚀 Part II: Build, Deploy & Serve

Fluid Compute — The 2025 Game Changer

6. Fluid Compute — The 2025 Game Changer

Fluid Compute is Vercel's biggest architectural evolution since the platform launched. Announced February 2025, enabled by default for new projects from April 23, 2025.

The problem with traditional serverless

Traditional serverless uses a microVM per function invocation. When a function is waiting for a database query or external API call (I/O), the microVM sits idle — but you're still paying for it.

For AI applications, this is especially painful: a streaming LLM response can take 10–60 seconds, but the actual CPU computation is only a fraction of that. Under traditional pricing, you pay for all 60 seconds.

What Fluid Compute does differently

1. Shared instances (not microVM-per-invocation)
Multiple invocations can share the same physical instance concurrently. Think of it as "mini-servers" rather than single-use functions. This eliminates the cold start overhead of spinning up a new microVM for each request.

2. Active CPU pricing
You only pay for the milliseconds your code actually executes on the CPU — not for I/O wait time.

Traditional: Billed for entire request duration (10s of LLM streaming = 10s billing)
Fluid:       Billed only for active CPU (10s streaming, 200ms CPU = 200ms billing)

3. Bytecode caching
V8 bytecode is cached across invocations in production, eliminating parse/compile overhead on repeated executions. Only available in production (not preview deployments).

4. Error isolation
Unhandled errors in one concurrent request do not crash other requests sharing the same instance.

5. Multi-region failover (Enterprise)
First failover to another availability zone in the same region. If the entire region is down, automatically routes to the next closest region.

Cost impact example (from Vercel docs)

Function with 4GB memory in São Paulo:

CPU rate: $0.221/hour
Memory rate: $0.0183/GB-hour
Request: 4 seconds active CPU, 10 seconds instance lifetime

Active CPU cost: (4s / 3600) × $0.221 = $0.000245
Memory cost:     (10s / 3600) × 4GB × $0.0183 = $0.000203
Total per request: ~$0.000448

AI applications with long I/O (LLM calls, streaming)
APIs with high concurrency and variable workload
Any application where cold starts are causing user experience issues
I/O-heavy backends (database queries, third-party API aggregation)

Fluid Compute limitations

Shared instance means global state leaks between requests if you write to module-level variables — warn customers about this
Not available for preview deployments (bytecode caching is production-only)
Memory is billed for the entire instance lifetime, not just CPU time — important for long-running instances

← Previous

5. Compute — The Three Runtimes