In order to support a growing catalog of AI models while maximizing GPU utilization, Cloudflare built an internal platform called Omni. This platform uses lightweight isolation and memory over-commitment to run multiple AI models on a single GPU, allowing us to serve inference requests closer to users and improve overall availability across our ...