22

I'm researching self-supervised machine learning code.

And I have wanted to debug the code with python debugger not pdb.set_trace(). This is python command for ubuntu terminal.

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py \
--data_path /dataset/imagenet/train \
--epochs 400 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 8 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 380 \
--epoch_queue_starts 15\
--workers 10

In order to debug the code with VScode, I tried to revise launch.json like below as referring stackoverflow question

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "module": "torch.distributed.launch --nproc_per_node=1 main_swav.py",
            "request": "launch",
            "console": "integratedTerminal",
            "args": ["--data_path", "/dataset/imagenet/train"]
        }
    ]
}

I knew this would not work...

Could you give me some advice?

2
  • args is used to pass command line arguments along to the app being launched. program is used to specify the python file. View debug python in vscode before you start debugging. Commented May 14, 2021 at 7:37
  • Note that currently (2024): "type" : "python" is deprecated. Use "debugpy" instead: See code.visualstudio.com/docs/python/debugging#_module Commented Mar 21, 2024 at 10:30

5 Answers 5

24

Specify the module you want to run with "module": "torch.distributed.launch"

You can ignore the -m flag. Put everything else under the args key.

Note: Make sure to include --nproc_per_node and the name of file (main_swav.py) in the list of arguments

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "debugpy",
            "module": "torch.distributed.launch",
            "request": "launch",
            "console": "integratedTerminal",
            "args": [
                "--nproc_per_node", "1", 
                "main_swav.py",
                "--data_path", "/dataset/imagenet/train",
            ]
        }
    ]
}

Read more here: https://code.visualstudio.com/docs/python/debugging#_module

Sign up to request clarification or add additional context in comments.

6 Comments

Any update in this for the newer torchrun?
@aviator set your module to torch.distributed.run
@ringo That would cast away the benefit from torchrun and accompany efforts setting various distribution related environment variables. Why not just run as program:{torchrun_path}?
I don't think so. torchrun is just a simple Python script file that runs torch.distributed.run.main, and running torch.distributed.run as a module does the same thing.
If you have the program entry (e.g., "program": "${workspaceFolder}/main.py") remove it and add It in the args after "--nproc_per_node", "1",
|
6

This is an example of my launch.json that I use to debug Python modules.

It has an additional configuration to debug "current file" (not as module) which is useful to keep.

{
linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Module",
      "type": "python",
      "request": "launch",
      "module": "path.to.module",
      "args": ["run_example --arg <arg>"],
      "justMyCode": true
    },
    {
      "name": "Python: Current File",
      "type": "python",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal",
      "justMyCode": true
    }
  ]
}

This would replicate a terminal command to run a Python module like so:

python -m path.to.module run_example --arg <arg>

Comments

3

For the new torchrun (Torch 2.0.1) here is launch.json:

  • "module" is no longer used
  • the full path to torchrun is needed (at least for me)
{
    "name": "working",
    "type": "python",
    "request": "launch",
    "program": "/home/user_x/anaconda3/envs/env_y/bin/torchrun",
    "console": "integratedTerminal",
    "justMyCode": true,
    "args": [
        "example_chat_completion.py",
        "--nproc_per_node", "1",
        "--ckpt_dir", "llama-2-7b-chat/",
        "--tokenizer_path", "tokenizer.model",
        "--max_seq_len", "512",
        "--max_batch_size", "6",
    ]
},

Comments

0

Related issues: https://github.com/microsoft/debugpy/issues/1311

First the .vscode/launch.json setting is:

{
    // kill -9 $(pgrep -f "python3 -m debugpy" | xargs echo)
    // python -m debugpy --listen 5678 --wait-for-client xxx.py
    // python -m debugpy --listen 5678 --wait-for-client -m torch.distributed.run --nproc_per_node 1 --nnodes 1 xxx.py
    "version": "0.2.0",
    "configurations": [
        {
            "name": "LDebug",
            "type": "python",
            "request": "attach",
            "connect": {
            "host": "localhost",
            "port": 5678
            }
        }
    ]
}

For torchrun debug command:

python -m debugpy --listen 5678 --wait-for-client -m torch.distributed.run --nproc_per_node 1 --nnodes 1 xxx.py

For normal python debug command:

python -m debugpy xxx.py

after excute command, press F1/or click start debug to link with the port.

Comments

0

For torch>=2.7, use torch.distributed.run:

{
  "name": "train",
  "type": "debugpy",
  "request": "launch",
  "module": "torch.distributed.run",
  "args": [
    "--nproc-per-node=1",
    "--standalone",
    "train.py"
  ]
}

Reference: torchrun (Elastic Launch) — PyTorch 2.7 documentation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.