Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page explains why Serverless Inference returns 429 rate limit errors and how to resolve them so your requests succeed within the allowed concurrency limits. Rate limit errors (429) occur when you exceed concurrency limits. Error: “Concurrency limit reached for requests” Solution: To resolve the error, do one of the following:
  • Reduce the number of parallel requests.
  • Add delays between requests.
  • Implement exponential backoff.
Note: Rate limits apply per W&B project.

Best practices to avoid rate limits

The following practices help your application stay within concurrency limits and recover gracefully when it hits limits.
  • Implement retry logic with exponential backoff: Backoff spaces out retries so transient 429 responses clear before the next attempt.
    import time
    
    def retry_with_backoff(func, max_retries=3):
        for i in range(max_retries):
            try:
                return func()
            except Exception as e:
                if "429" in str(e) and i < max_retries - 1:
                    time.sleep(2 ** i)
                else:
                    raise
    
  • Use batch processing instead of parallel requests.
  • Monitor your usage on the W&B Billing page.

Default spending caps

Accounts also have default spending caps that bound overall Inference usage:
  • Pro accounts: $6,000 per month
  • Enterprise accounts: $700,000 per year
Contact your account executive or support to adjust limits.
Inference