Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Follow these best practices to handle W&B Serverless Inference errors and maintain reliable applications.
Always implement error handling
Wrap API calls in try-except blocks:
import openai
try:
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=messages
)
except Exception as e:
print(f"Error: {e}")
# Handle error appropriately
Use retry logic with exponential backoff
Retry transient failures with increasing delays between attempts:
import time
from typing import Optional
def call_inference_with_retry(
client,
messages,
model: str,
max_retries: int = 3,
base_delay: float = 1.0
) -> Optional[str]:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages
)
return response.choices[0].message.content
except Exception as e:
if attempt == max_retries - 1:
raise
# Calculate delay with exponential backoff
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed, retrying in {delay}s...")
time.sleep(delay)
return None
Monitor your usage
- Track credit usage in the W&B Billing page.
- Set up alerts before you hit limits.
- Log API usage in your application.
Handle specific error codes
def handle_inference_error(error):
error_str = str(error)
if "401" in error_str:
# Invalid authentication
raise ValueError("Check your API key and project configuration")
elif "402" in error_str:
# Out of credits
raise ValueError("Insufficient credits")
elif "429" in error_str:
# Rate limited
return "retry"
elif "500" in error_str or "503" in error_str:
# Server error
return "retry"
else:
# Unknown error
raise
Set appropriate timeouts
Configure reasonable timeouts for your use case:
# For longer responses
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="your-api-key",
timeout=60.0 # 60 second timeout
)
Additional tips
- Log errors with timestamps for debugging.
- Use async operations to handle concurrency better.
- Implement circuit breakers for production systems.
- Cache responses when appropriate to reduce API calls.
Inference