This page describes the pricing model, concurrency limits, and geographic restrictions that apply to W&B Serverless RL. Review this information to estimate costs and to understand the constraints that affect how you run training and inference workloads.Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Pricing
Pricing has three components: inference, training, and storage. For specific billing rates, visit our pricing page. The following sections describe each component.Inference
Pricing for Serverless RL inference requests matches Serverless Inference pricing. See model-specific costs. Learn more about purchasing credits, account tiers, and usage caps in the Serverless Inference docs.Training
At each training step, Serverless RL collects batches of trajectories that include your agent’s outputs and associated rewards (calculated by your reward function). Serverless RL uses the batched trajectories to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters that Serverless RL manages. Training is free during the public preview period.Model storage
Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. W&B bills storage monthly based on total checkpoint size and your pricing plan. Every plan includes at least 5 GB of free storage, which is enough for roughly 30 LoRAs. To save space, delete low-performing LoRAs. See the ART SDK for instructions.Limits
The following limits apply to Serverless RL usage. Review them when sizing workloads or when planning to use the service from a new region.-
Inference concurrency limits: By default, Serverless RL supports up to 2,000 concurrent requests per user and 6,000 per project. If you exceed your rate limit, the Inference API returns a
429 Concurrency limit reached for requestsresponse. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, request one at support@wandb.com. - Geographic restrictions: Serverless RL is only available in supported geographic locations. For more information, see the Terms of Service.