Ray Data for Batch AI

Back to modules
Course progress50%
article

Batch inference with Ray Data

Load model state once and score large datasets efficiently.

Batch inference with Ray Data

Batch inference is one of Ray Data's most practical entry points. Teams can keep model code in Python while scaling over large datasets and GPU workers.

Use callable classes for model state

Load model weights once per worker by using a class with map_batches.

class FraudScorer:
    def __init__(self):
        self.model = load_model("/mnt/models/fraud")

    def __call__(self, batch):
        batch["score"] = self.model.predict_proba(batch[FEATURES])[:, 1]
        return batch

scored = features.map_batches(
    FraudScorer,
    compute=ray.data.ActorPoolStrategy(size=8),
    batch_format="pandas",
)

Think in throughput limits

Throughput depends on storage read speed, preprocessing cost, model latency, and write bandwidth. The first tuning pass should identify the current bottleneck before adding more workers.

Release checklist

  • Pin model and feature versions.
  • Emit row counts before and after filtering.
  • Store prediction timestamps and model identifiers.
  • Validate output schema before writing to the serving or analytics table.

Batch inference with Ray Data

Batch inference