People search for "Apache Ray" because the name feels like it should be one. It is not. Ray is an open-source distributed compute framework released by RISELab at UC Berkeley in 2017, now maintained primarily by Anyscale and a community of contributors. It is licensed Apache 2.0, which is probably where the confusion starts, but it is not part of the Apache Software Foundation.
This guide is the missing primer: what Ray actually is, what its architecture looks like, what you can build on it, and how it differs from the things people compare it to.
What Ray is in one sentence
Ray is a runtime that lets you turn any Python function or class into a unit of distributed work, schedule it across a cluster of machines, and treat the result as a normal Python object.
The library you pip install ray is roughly 4 million lines of C++ wrapped by a Python API. The C++ core handles scheduling, the global control store (GCS), memory management, and a distributed object store called Plasma. Python is the surface.
The five-line example
import ray
ray.init()
@ray.remote
def square(x):
return x * x
futures = [square.remote(i) for i in range(100)]
print(ray.get(futures))
That program runs locally with ray.init(), on a 1,000-node cluster with ray.init(address="ray://head:10001"). No code change. That portability is the central design promise.
Architecture in one diagram, in words
Each Ray cluster has:
- One head node running the Global Control Store (GCS) and the autoscaler.
- N worker nodes running Raylets, each of which manages local task scheduling and the local Plasma object store.
- A driver process (your script) that submits tasks to the cluster.
When you call square.remote(5), the driver enqueues a task descriptor. The Raylet on a worker picks it up, runs the function, and stores the result in Plasma. Calling ray.get(future) blocks until the result is materialized in your driver's local Plasma, then returns the Python object.
The clever bit is that large objects do not move through the driver. If task A produces a 4 GB tensor and task B consumes it, the tensor stays in worker shared memory and B reads it via zero-copy. That is how Ray keeps Python distributed compute close to bare-metal throughput.
The libraries on top
Ray ships with five batteries-included libraries that are the actual reason most teams adopt it.
| Library | What it does | When to use |
|---|---|---|
| Ray Core | Tasks, actors, objects | Any custom distributed Python |
| Ray Data | Parallel data preprocessing | Replacing Spark for Python ETL |
| Ray Train | Distributed model training | PyTorch, JAX, TensorFlow at scale |
| Ray Tune | Hyperparameter search | Any sweep above 50 trials |
| Ray Serve | Online model serving | Replicated, scalable inference |
| RLlib | Reinforcement learning | PPO, DQN, IMPALA at scale |
Each is independently useful. The composability — training in Ray Train, tuning with Ray Tune, serving with Ray Serve, all on the same cluster — is the strategic moat.
For a deeper feature-by-feature breakdown see our pillar on Ray vs alternatives.
Why people compare it to Apache Spark
The naming similarity, plus the fact that both are general-purpose distributed compute frameworks. The actual differences:
| Topic | Ray | Spark |
|---|---|---|
| Founded | 2017, RISELab | 2009, AMPLab |
| Language core | C++ with Python | Scala on JVM |
| Foundation | None (Anyscale-led) | Apache Software Foundation |
| Task latency | ~200 microseconds | ~1 to 3 seconds per stage |
| State | Stateful actors built in | Stateless |
| Sweet spot | ML and Python compute | SQL and analytics |
For a deep dive see our Ray vs Spark comparison.
Adoption and scale
Public references in 2026:
- OpenAI uses Ray for distributed training infrastructure, disclosed in the GPT-4 system card.
- Uber runs Ray at thousands of nodes for ML platform.
- Shopify uses Ray Serve for product recommendation serving.
- Spotify uses Ray Tune for personalization model search.
- Pinterest, ByteDance, LinkedIn, Instacart all have public Ray case studies.
Anyscale published a 2024 survey claiming 38 percent of public-cloud ML workloads run on Ray. The number is self-reported and probably high. Even halved, the user base is enormous.
The four things that trip people up
Ray is not Apache. The project is governed by Anyscale and the open-source community via the Ray Project on GitHub. Anyscale offers a managed cloud platform; Ray itself is free under Apache 2.0.
Ray Serve is not a complete API gateway. It scales a single model. For multi-provider routing, fallbacks, rate limits, billing, and audit you need a layer on top. That is what AI gateways are for; see our LLM serving frameworks comparison.
Ray clusters are stateful. The GCS holds task lineage. Killing the head node loses scheduled state unless you externalized GCS to Redis. Always do this in production.
Object spilling is not free. When the in-memory object store fills up, Ray spills to disk. Local NVMe is fine. Spilling to S3 is supported but multi-second per gigabyte. Set RAY_object_spilling_config carefully.
Cluster sizing rule of thumb
Workload Cluster size
----------------------------- --------------------
Notebook prototyping 1 node, 8-16 cores
Hyperparameter sweep, 200 tr 4-8 GPU nodes
Distributed PyTorch (8B model) 8-16 GPU nodes
LLM batch inference (1B docs) 16-64 GPU nodes
Reinforcement learning at sc. 64-512 nodes
Frontier-scale training 1,000 plus nodes
Ray scales linearly to roughly 1,000 nodes without special tuning. Above that, ByteDance and OpenAI run patches on the GCS for higher throughput; the patches have been upstreamed in Ray 2.10 and later.
The Anyscale relationship in plain English
Anyscale is the company that employs most Ray maintainers. It sells:
- A managed Ray cloud product.
- Anyscale Workspaces, a hosted dev environment.
- Anyscale Jobs, a managed batch-job runner.
You do not need Anyscale to run Ray. KubeRay on a vanilla EKS or GKE cluster is fully featured. Anyscale is a convenience layer for teams who do not want to operate Kubernetes.
For the upstream Ray project see github.com/ray-project/ray and docs.ray.io. For a peer-reviewed architectural paper see the original Ray OSDI 2018 paper.
Where Ray fits with model APIs
Ray decides where your compute happens. It does not decide which model handles a request when you have multiple providers in play. Many production systems run Ray for self-hosted training and inference, and burst into managed APIs through a gateway like Swfte Connect when local capacity is saturated. That hybrid pattern is increasingly the default in 2026 because raw GPU supply remains tight.
Hands-on next steps
For working code, see our Ray Python distributed tutorial and Ray example workloads. Both walk through real cluster setup rather than toy examples.
What to do this quarter
- Stop searching for "Apache Ray." It is just Ray. Apache 2.0 license, no Apache Foundation involvement.
- Pip install Ray on a laptop and run the five-line example. The whole API is approachable in an afternoon.
- If you have a Python distributed workload, prototype it on Ray Core. Most teams find a 3x to 10x speedup over multiprocessing.
- Stand up a small KubeRay cluster on three nodes. Keep it as a permanent dev environment.
- Externalize the GCS to managed Redis before any production deployment. Skip this and lose state on every head-node restart.