Apps on Azure Blog

9 MIN READ

Anyscale on Azure: Powering Enterprise AI at Massive Scale on Azure Kubernetes Service

Microsoft

Jun 02, 2026

Somewhere on your AI platform team, an engineer is on call this weekend — not for the model, not for the training run, but for the integration code holding five separate AI processing systems together. Data preparation on one. Training on a second. Evaluation on a third. Serving on a fourth. Observability bolted on across all of it. The glue between them has quietly evolved into a production system of its own, complete with its own failure modes and its own pager.

This is what running AI at scale looks like for most enterprises in 2026. To process the full breadth of AI workloads, teams don’t have one platform, but a stack of multiple compute engines — stitched together and monitored around the clock. Training failures become increasingly costly as multi-node GPU clusters remain underutilized and difficult to operate. Inference costs climb in a straight line when they should be bending the other way. And the accelerators underneath, at six figures a year per node, sit at 30–40% utilization.

None of this is a model problem. It is a systems problem, and it exposes a divide that is widening across the industry.

The AI shift: Moving from API inference calls only to end-to-end AI

Most enterprises start an AI journey by calling hosted model APIs. It’s the fastest way to experiment and ship. But as adoption grows, inference costs scale in a straight line while differentiation remains limited. The organizations pulling ahead are doing more than consuming models. They are customizing them with proprietary data, operating them at scale, and owning the infrastructure between their data and their models. Their unit economics improve as they scale. The dividing line isn’t budget. It isn’t ambition. It is a single architectural decision: whether the layer between your data and your models is something you rent in pieces or run as a single system.

That unified system for end-to-end AI, almost without exception, is built on one runtime: Ray, an open-source framework widely adopted by AI-natives such as Cursor, Mistral and xAI to act as the engine that powers many of their workloads from multimodal data processing to reinforcement learning.

Anyscale on Azure: Build and run end-to-end AI on your Azure subscription

Anyscale on Azure brings the distributed compute runtime the AI industry has converged on — Ray— into your Azure tenant as an Azure Native service, that includes purpose-built developer tooling and unified pane for cluster management, built through deep engineering collaboration between Anyscale and Microsoft.

Unlike other processing engines which either only support one hardware type (e.g. CPUs) or focus on a single workload (e.g. inference), Ray turns a heterogeneous cluster of CPUs and GPUs into a single Python runtime composing data preparation, distributed training, fine-tuning, reinforcement learning, high-throughput inference, and agentic execution as one program, not five interlocking systems. Anyscale created Ray and stewards the open-source Ray project, now governed by the PyTorch Foundation; the Anyscale Runtime is the production-grade layer that enterprises can utilize on critical paths from day one, bringing managed cluster operations, enterprise-grade support, and the operational reliability needed to run AI and data workloads at scale.

On Azure, that runtime executes on your Azure Kubernetes Service (AKS) clusters, inside your subscription, and under Microsoft Entra ID workload identity. Your data, models, and weights never leave your cloud, and consumption is billed through Azure with drawdown against your existing Azure commitment (MACC).

Sovereignty isn't a label bolted on after the fact. It is the architectural starting point: customer-owned data and models in the customer-owned tenant and governance boundary. The variable per-token economics of hosted APIs are replaced with compute you govern directly. Your proprietary data becomes a compounding advantage rather than a payload shipped to a third-party endpoint.

A single runtime for the full AI lifecycle

The cost profile of enterprise AI is largely architectural. Fragmented stacks — separate systems for prep, training, evaluation, and serving — produce a predictable set of failure modes such as Idle GPU time, Integration code and cross-system data movement.

The result: production GPU utilization only in the 30–40% range, against accelerators that cost six figures per node per year.

On the same fleet, Anyscale customers run those accelerators at 80%+ sustained utilization and report 40–60% lower GPU spend versus static, single-tenant clusters — driven by fractional GPU allocation (down to 0.2 of a device), bin-packing across complementary memory and compute profiles, gang scheduling for distributed training, priority-aware preemption that lets production inference take precedence over ad-hoc training, and spot integration with checkpoint-aware preemption so long-running jobs survive reclamation without lost work.

Anyscale on Azure replaces this with a single Ray-powered runtime that spans the lifecycle as one distributed computation graph:

Ray Data (distributed preparation) → Ray Train (fault-tolerant training) → Ray Tune (hyperparameter search) → Ray Serve (inference) — under one managed control plane.

On top of open-source Ray, the Anyscale Runtime adds fault-tolerant training with checkpoint/restart, optimized scheduling, faster cluster bring-up, inference-aware autoscaling, and per-stage observability.

Ray is the unifying layer that, rather than replacing, streamlines distributed processing of the framework stack the AI industry already uses: PyTorch, Hugging Face Transformers, FSDP, DeepSpeed, and Megatron for training, vLLM and SGLang for high-throughput inference with continuous batching, paged attention, and speculative decoding. Ray Train orchestrates the three parallelism patterns modern training requires — data parallel, model parallel, and hybrid 3D parallel (data + tensor + pipeline) — for trillion-parameter models, without requiring teams to write custom distributed code.

The architectural payoff is direct: a single Python program defines a graph spanning CPU-heavy preparation and GPU-heavy training. The model produced by Ray Train is served by Ray Serve in the same cluster, against the same storage. The operational, identity, and observability surface is unified instead of fragmented.

What enterprises deploy with Anyscale on Azure

There are five workloads that power the development of modern AI systems, spanning data processing, training, inference, and simulation. But in most environments, each depends on separate engines, frameworks, and orchestration layers. The resulting fragmentation drives up infrastructure spend, latency, and engineering complexity. This makes a single Ray-based runtime under Anyscale’s managed control plane the operationally rational choice.

Anyscale on Azure provides a complete platform to build and deploy AI applications using the same APIs as open-source Ray. While the data plane runs inside the customer’s AKS cluster, the managed control plane provides a unified interface for development, debugging, and cluster operations.

AI in your trust boundary by design: the architecture

Anyscale on Azure is an Azure Native product — discoverable via the Azure portal and provisioned through Azure Resource Manager with every resource tagged, scoped, and policy‑bound like any other in your subscription.

Anyscale on Azure is a split-plane deployment:

Control plane (managed by Anyscale) — scheduling, jobs, services, workspaces, and observability.
Data plane (your Azure subscription) — Ray clusters run on your AKS, in your VNet, on your storage (Azure Blob / ADLS Gen2 via BlobFuse2).

The trust boundary is what matters — more than any individual data plane feature — for regulated workloads (financial services, healthcare, public sector) and any enterprise where proprietary data is the differentiation.

The execution model:

Workloads run inside your AKS cluster — your subscription, your VNet. Model weights, training data, KV caches, checkpoints, and inference traffic never leave the boundary.
Provisioning is ARM-native — resources tag, scope, and inherit Azure Policy like anything else in the subscription.
Identity is Microsoft Entra ID end to end — workload identity issues pod credentials; RBAC governs access. No long-lived keys, no parallel secret store.
Network controls are yours — Private Link, NSGs, Cilium-based Azure CNI policies, and customer-managed encryption keys via Key Vault.
Audit is the Azure Activity Log — the same surface your compliance team already monitors.
The Anyscale Operator is the only Anyscale-controlled component in your environment — it runs inside your AKS, communicates with the control plane via egress only, and accepts no inbound access from Anyscale.

The result: code and data stay in your Azure subscription. Your existing compliance posture, audit surface, and data residency certifications carry forward — nothing new to attest. Billing rolls through the same Azure invoice with MACC drawdown — no second invoice, no parallel procurement.

Production evidence

Xoople planetary‑scale satellite imagery on Anyscale on Azure; multimodal AI turns spectral data into operational intelligence. "Anyscale lets our teams focus on models and outcomes rather than infrastructure, dramatically accelerating the path from experimentation to deployment," — Milos Colic, VP of Engineering, Xoople.

Wayve trains the next generation of autonomous‑driving foundation models on Anyscale on Azure, running distributed ML and data pipelines across large CPU and GPU fleets. The operational driver is GPU‑capacity aggregation at a scale that no single region or cluster can deliver.

Beyond Anyscale on Azure, the same Ray runtime is used in production at Cursor, Physical Intelligence, xAI, Coinbase, Bedrock Robotics, and Runway. Bedrock Robotics scaled compute 85x on Anyscale without linearly increasing costs. Currently with 12M+ weekly downloads (+400% YoY) and 42K+ GitHub stars and now openly governed under the PyTorch Foundation (Linux Foundation), Ray is becoming the de-factor open-source standard and is not a single-vendor runtime.

Pricing

Pricing is usage‑based and consolidates onto the same Azure invoice as the rest of the customer's subscription, including drawdown against existing Azure commitment (MACC):

Azure infrastructure — standard Azure compute and GPU charges for the AKS substrate the workload runs on, scaling directly with actual usage.
Anyscale service layer — pay‑as‑you‑go through Azure service meters with no upfront commitment, priced by CPU, memory, and GPU type.

Where Anyscale on Azure fits

Base-model intelligence is converging. Enterprises can buy access to the same frontier models, so the model itself is no longer the moat. What separates the enterprises pulling ahead is the layer underneath: how efficiently they run the full AI lifecycle at scale, how much compounding leverage they extract from their proprietary data, and whether they own the runtime that ties it all together. Anyscale on Azure is the Azure Native runtime layer for that posture — bringing the open-source distributed compute standard the AI industry has converged on into the same Azure governance, identity, and procurement model as the rest of the tenant.

The shape of enterprise AI is settling. The teams pulling ahead are not the ones renting the most intelligence through APIs — they are the ones building and operating AI systems inside their own cloud, on their own data, under their own governance, and scaling those systems on the open distributed runtime the industry has already converged on.

Anyscale on Azure is that runtime, delivered as an Azure Native product:

Ray, productionized — the open‑source distributed compute standard for AI, hardened with the Anyscale Runtime, a managed control plane, and observability designed for foundation‑model‑scale workloads.
One runtime, the full AI lifecycle — data preparation, training, fine‑tuning, reinforcement learning, inference, and agentic workloads in a single Python program, on a single substrate, with no cross‑system glue.
Inside your Azure tenant, on the AKS you already run — customer‑owned data, customer‑owned models, customer‑owned governance. Entra identity, Azure RBAC, Private Link, Activity Log audit, and customer‑managed keys end to end.
One Azure invoice — usage‑based pricing through the Marketplace with MACC drawdown; no parallel procurement, no second vendor contract.

If your team is wrestling with GPU utilization, fragmented data‑to‑serving stacks, training jobs that exceed any single region's capacity, or hosted‑API costs that scale faster than your usage — this is the runtime built for that problem.

Try it now

Provision your first Anyscale Cloud by navigating to the Azure portal. Click on "Create" to begin creating the Anyscale cloud resource and link the necessary Azure resources.

Create your Anyscale Cloud directly from Azure Portal.Attach an existing AKS cluster. Configure Storage and ACR Azure resources.Click on "Launch Anyscale" to navigate to the Anyscale console.

Explore the quickstart guides and documentation on Microsoft Learn to get started. For architectural deep‑dives, capacity planning, or a hands‑on workshop with the Anyscale on Azure solution architects, reach out through your Microsoft account team.

Deepen your expertise and deep dive on best practices in the upcoming virtual webinar. Register here.

The infrastructure for the next decade of enterprise AI is here. Build on it.