This article explains how to use W&B in training programs that span multiple processes, such as distributed training jobs, so that runs are logged correctly without conflicts. If a training program uses multiple processes, structure the program to avoid making W&B method calls from processes withoutDocumentation Index
Fetch the complete documentation index at: https://wb-21fd5541-style-guide-support-models-articles-20260527-00.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
wandb.init().
Choose one of the following approaches to manage multiprocess training:
- Call
wandb.init()in all processes and use the group keyword argument to create a shared group. Each process has its own W&B run, and the UI groups the training processes together. - Call
wandb.init()from only one process and pass data to log through multiprocessing queues.
Refer to Log distributed training experiments for detailed explanations of these approaches, including code examples with Torch DDP.
Experiments