BACK

Building the AI Inference Platform

Share:

11:15 - 11:45, 20th of May (Wednesday) 2026 / AI & Architecture

We will talk about the unique technical complexities and architectural decisions required to build a high-performance, multi-tenant serverless AI inference platform. Unlike traditional serverless workloads, AI inference must manage model loading, accelerator selection, batching, and memory-heavy runtimes. We will explore how infrastructure choices are intrinsically linked to latency, cost, isolation, and margin. Key topics include transforming "cold starts" into complex model placement problems, designing state-aware routing for LLM workloads that benefit from preserving KV cache, and the critical role of GPU orchestration. Finally, we will detail essential architectural systems that are needed to successfully serve hundreds of customers while maintaining fairness, security, and utilization.

LEVEL:
Basic Advanced Expert
TRACK:
AI Architecture & Software IT Leaders