Batch inference with Ray Data

Load model state once and score large datasets efficiently.

Batch inference with Ray Data Batch inference is one of Ray Data's most practical entry points. Teams can keep model code in Python while scaling over large datasets and GPU workers. Use callable classes for model state Load model weights once per worker by using a class with map_batches . class FraudScorer: def __init__(self): self.model = load_model("/mnt/models/fraud") def __call__(self, batch): batch["score"] = self.model.predict_proba(batch[FEATURES])[:, 1] return batch scored = features.map_batches( FraudScorer, compute=ray.data.ActorPoolStrategy(size=8), batch_format="pandas", ) Think in throughput limits Throughput depends on storage read speed, preprocessing cost, model latency, and write bandwidth. The first tuning pass should identify the current bottleneck before adding more workers. Release checklist Pin model and feature versions. Emit row counts before and after filtering. Store prediction timestamps and model identifiers. Validate output schema before writing to the serving or analytics table.

Batch inference