Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This article explains how to use W&B in training programs that span multiple processes, such as distributed training jobs, so that runs are logged correctly without conflicts. If a training program uses multiple processes, structure the program to avoid making W&B method calls from processes without wandb.init(). Choose one of the following approaches to manage multiprocess training:
  • Call wandb.init() in all processes and use the group keyword argument to create a shared group. Each process has its own W&B run, and the UI groups the training processes together.
  • Call wandb.init() from only one process and pass data to log through multiprocessing queues.
Refer to Log distributed training experiments for detailed explanations of these approaches, including code examples with Torch DDP.

Experiments