Ray Core for Python Systems

1Object store and scheduling2Scheduling scenario
Back to modules
Course progress50%
article

Object store and scheduling

Keep data movement and resource intent visible.

Object store and scheduling basics

Ray separates task scheduling from object movement. That separation is what lets an application submit many small units of work while keeping large intermediate results in the distributed object store.

Object references

Remote calls return references. A reference is not the data itself; it is a handle Ray can pass through downstream tasks without pulling bytes through the driver.

@ray.remote
def load_partition(path):
    return read_parquet_partition(path)

@ray.remote
def summarize(table):
    return table["value"].mean()

table_ref = load_partition.remote("s3://bucket/day=2026-06-18/")
summary_ref = summarize.remote(table_ref)

Placement and resources

Ray schedules work against available CPU, GPU, memory, and custom resources. Good applications make resource intent visible instead of hiding it inside task code.

@ray.remote(num_cpus=2, memory=4 * 1024**3)
def transform(batch):
    return expensive_transform(batch)

Common failure mode

The driver becomes a bottleneck when it repeatedly calls ray.get inside a loop. Prefer submitting a window of work and using ray.wait when you need backpressure.

Object store and scheduling

Scheduling and objects