technology

Apache Ray? The Distributed Python Framework Explained for 2026

Ray is not an Apache project. What Ray actually is, how it works, and when to use it.

May 4, 2026

English

People search for "Apache Ray" because the name feels like it should be one. It is not. Ray is an open-source distributed compute framework released by RISELab at UC Berkeley in 2017, now maintained primarily by Anyscale and a community of contributors. It is licensed Apache 2.0, which is probably where the confusion starts, but it is not part of the Apache Software Foundation.

This guide is the missing primer: what Ray actually is, what its architecture looks like, what you can build on it, and how it differs from the things people compare it to.

What Ray is in one sentence

Ray is a runtime that lets you turn any Python function or class into a unit of distributed work, schedule it across a cluster of machines, and treat the result as a normal Python object.

The library you pip install ray is roughly 4 million lines of C++ wrapped by a Python API. The C++ core handles scheduling, the global control store (GCS), memory management, and a distributed object store called Plasma. Python is the surface.

The five-line example

import ray
ray.init()

@ray.remote
def square(x):
    return x * x

futures = [square.remote(i) for i in range(100)]
print(ray.get(futures))

That program runs locally with ray.init(), on a 1,000-node cluster with ray.init(address="ray://head:10001"). No code change. That portability is the central design promise.

Architecture in one diagram, in words

Each Ray cluster has:

One head node running the Global Control Store (GCS) and the autoscaler.
N worker nodes running Raylets, each of which manages local task scheduling and the local Plasma object store.
A driver process (your script) that submits tasks to the cluster.

When you call square.remote(5), the driver enqueues a task descriptor. The Raylet on a worker picks it up, runs the function, and stores the result in Plasma. Calling ray.get(future) blocks until the result is materialized in your driver's local Plasma, then returns the Python object.

The clever bit is that large objects do not move through the driver. If task A produces a 4 GB tensor and task B consumes it, the tensor stays in worker shared memory and B reads it via zero-copy. That is how Ray keeps Python distributed compute close to bare-metal throughput.

The libraries on top

Ray ships with five batteries-included libraries that are the actual reason most teams adopt it.

Library	What it does	When to use
Ray Core	Tasks, actors, objects	Any custom distributed Python
Ray Data	Parallel data preprocessing	Replacing Spark for Python ETL
Ray Train	Distributed model training	PyTorch, JAX, TensorFlow at scale
Ray Tune	Hyperparameter search	Any sweep above 50 trials
Ray Serve	Online model serving	Replicated, scalable inference
RLlib	Reinforcement learning	PPO, DQN, IMPALA at scale

Each is independently useful. The composability — training in Ray Train, tuning with Ray Tune, serving with Ray Serve, all on the same cluster — is the strategic moat.

For a deeper feature-by-feature breakdown see our pillar on Ray vs alternatives.

Why people compare it to Apache Spark

The naming similarity, plus the fact that both are general-purpose distributed compute frameworks. The actual differences:

Topic	Ray	Spark
Founded	2017, RISELab	2009, AMPLab
Language core	C++ with Python	Scala on JVM
Foundation	None (Anyscale-led)	Apache Software Foundation
Task latency	~200 microseconds	~1 to 3 seconds per stage
State	Stateful actors built in	Stateless
Sweet spot	ML and Python compute	SQL and analytics

For a deep dive see our Ray vs Spark comparison.

Adoption and scale

Public references in 2026:

OpenAI uses Ray for distributed training infrastructure, disclosed in the GPT-4 system card.
Uber runs Ray at thousands of nodes for ML platform.
Shopify uses Ray Serve for product recommendation serving.
Spotify uses Ray Tune for personalization model search.
Pinterest, ByteDance, LinkedIn, Instacart all have public Ray case studies.

Anyscale published a 2024 survey claiming 38 percent of public-cloud ML workloads run on Ray. The number is self-reported and probably high. Even halved, the user base is enormous.

The four things that trip people up

Ray is not Apache. The project is governed by Anyscale and the open-source community via the Ray Project on GitHub. Anyscale offers a managed cloud platform; Ray itself is free under Apache 2.0.

Ray Serve is not a complete API gateway. It scales a single model. For multi-provider routing, fallbacks, rate limits, billing, and audit you need a layer on top. That is what AI gateways are for; see our LLM serving frameworks comparison.

Ray clusters are stateful. The GCS holds task lineage. Killing the head node loses scheduled state unless you externalized GCS to Redis. Always do this in production.

Object spilling is not free. When the in-memory object store fills up, Ray spills to disk. Local NVMe is fine. Spilling to S3 is supported but multi-second per gigabyte. Set RAY_object_spilling_config carefully.

Cluster sizing rule of thumb

Workload                       Cluster size
-----------------------------  --------------------
Notebook prototyping           1 node, 8-16 cores
Hyperparameter sweep, 200 tr   4-8 GPU nodes
Distributed PyTorch (8B model) 8-16 GPU nodes
LLM batch inference (1B docs)  16-64 GPU nodes
Reinforcement learning at sc.  64-512 nodes
Frontier-scale training        1,000 plus nodes

Ray scales linearly to roughly 1,000 nodes without special tuning. Above that, ByteDance and OpenAI run patches on the GCS for higher throughput; the patches have been upstreamed in Ray 2.10 and later.

The Anyscale relationship in plain English

Anyscale is the company that employs most Ray maintainers. It sells:

A managed Ray cloud product.
Anyscale Workspaces, a hosted dev environment.
Anyscale Jobs, a managed batch-job runner.

You do not need Anyscale to run Ray. KubeRay on a vanilla EKS or GKE cluster is fully featured. Anyscale is a convenience layer for teams who do not want to operate Kubernetes.

For the upstream Ray project see github.com/ray-project/ray and docs.ray.io. For a peer-reviewed architectural paper see the original Ray OSDI 2018 paper.

Where Ray fits with model APIs

Ray decides where your compute happens. It does not decide which model handles a request when you have multiple providers in play. Many production systems run Ray for self-hosted training and inference, and burst into managed APIs through a gateway like Swfte Connect when local capacity is saturated. That hybrid pattern is increasingly the default in 2026 because raw GPU supply remains tight.

Hands-on next steps

For working code, see our Ray Python distributed tutorial and Ray example workloads. Both walk through real cluster setup rather than toy examples.

What to do this quarter

Stop searching for "Apache Ray." It is just Ray. Apache 2.0 license, no Apache Foundation involvement.
Pip install Ray on a laptop and run the five-line example. The whole API is approachable in an afternoon.
If you have a Python distributed workload, prototype it on Ray Core. Most teams find a 3x to 10x speedup over multiprocessing.
Stand up a small KubeRay cluster on three nodes. Keep it as a permanent dev environment.
Externalize the GCS to managed Redis before any production deployment. Skip this and lose state on every head-node restart.

نشر فيtechnology

Ray Distributed Python ML Infrastructure Anyscale Open Source

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.

← Back to all articles