Ray Core for Python Systems
Object store and scheduling
Keep data movement and resource intent visible.
Object store and scheduling basics
Ray separates task scheduling from object movement. That separation is what lets an application submit many small units of work while keeping large intermediate results in the distributed object store.
Object references
Remote calls return references. A reference is not the data itself; it is a handle Ray can pass through downstream tasks without pulling bytes through the driver.
@ray.remote
def load_partition(path):
return read_parquet_partition(path)
@ray.remote
def summarize(table):
return table["value"].mean()
table_ref = load_partition.remote("s3://bucket/day=2026-06-18/")
summary_ref = summarize.remote(table_ref)
Placement and resources
Ray schedules work against available CPU, GPU, memory, and custom resources. Good applications make resource intent visible instead of hiding it inside task code.
@ray.remote(num_cpus=2, memory=4 * 1024**3)
def transform(batch):
return expensive_transform(batch)
Common failure mode
The driver becomes a bottleneck when it repeatedly calls ray.get inside a loop. Prefer submitting a window of work and using ray.wait when you need backpressure.