Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page describes the pricing model, concurrency limits, and geographic restrictions that apply to W&B Serverless RL. Review this information to estimate costs and to understand the constraints that affect how you run training and inference workloads.

Pricing

Pricing has three components: inference, training, and storage. For specific billing rates, visit our pricing page. The following sections describe each component.

Inference

Pricing for Serverless RL inference requests matches Serverless Inference pricing. See model-specific costs. Learn more about purchasing credits, account tiers, and usage caps in the Serverless Inference docs.

Training

At each training step, Serverless RL collects batches of trajectories that include your agent’s outputs and associated rewards (calculated by your reward function). Serverless RL uses the batched trajectories to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters that Serverless RL manages. Training is free during the public preview period.

Model storage

Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. W&B bills storage monthly based on total checkpoint size and your pricing plan. Every plan includes at least 5 GB of free storage, which is enough for roughly 30 LoRAs. To save space, delete low-performing LoRAs. See the ART SDK for instructions.

Limits

The following limits apply to Serverless RL usage. Review them when sizing workloads or when planning to use the service from a new region.
  • Inference concurrency limits: By default, Serverless RL supports up to 2,000 concurrent requests per user and 6,000 per project. If you exceed your rate limit, the Inference API returns a 429 Concurrency limit reached for requests response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, request one at support@wandb.com.
  • Geographic restrictions: Serverless RL is only available in supported geographic locations. For more information, see the Terms of Service.