Object store and scheduling

Keep data movement and resource intent visible.

Object store and scheduling basics Ray separates task scheduling from object movement. That separation is what lets an application submit many small units of work while keeping large intermediate results in the distributed object store. Object references Remote calls return references. A reference is not the data itself; it is a handle Ray can pass through downstream tasks without pulling bytes through the driver. @ray.remote def load_partition(path): return read_parquet_partition(path) @ray.remote def summarize(table): return table["value"].mean() table_ref = load_partition.remote("s3://bucket/day=2026-06-18/") summary_ref = summarize.remote(table_ref) Placement and resources Ray schedules work against available CPU, GPU, memory, and custom resources. Good applications make resource intent visible instead of hiding it inside task code. @ray.remote(num_cpus=2, memory=4 * 1024**3) def transform(batch): return expensive_transform(batch) Common failure mode The driver becomes a bottleneck when it repeatedly calls ray.get inside a loop. Prefer submitting a window of work and using ray.wait when you need backpressure.

Scheduling and objects