I tried to train DeepLabV3+ architecture with a customized config having ResNet18 (converted to .pkl
from https://download.pytorch.org/models/resnet18-f37072fd.pth) as the backbone:
_BASE_: "detectron2/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml"
MODEL:
WEIGHTS: "r18.pkl"
BACKBONE:
NAME: "build_resnet_backbone"
RESNETS:
DEPTH: 18
RES2_OUT_CHANNELS: 64
STEM_OUT_CHANNELS: 64
RES5_DILATION: 1
NUM_GROUPS: 1
ROI_HEADS:
NUM_CLASSES: 1
on MoNuSeg 2020 dataset.
cfg = get_cfg()
add_deeplab_config(cfg)
cfg.merge_from_file('/kaggle/input/deeplab-v3-plus-models/deeplab_v3_plus_R_18_os16_mg124_poly_90k_bs16.yaml')
cfg.DATASETS.TRAIN = ('monuseg_train',)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.SOLVER.IMS_PER_BATCH = 16
cfg.SOLVER.BASE_LR = 0.01
cfg.SOLVER.MAX_ITER = 300
cfg.SOLVER.LR_SCHEDULER_NAME = 'WarmupMultiStepLR'
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
I also used the following lines to also try Trainer
code (from DeepLab project train_net.py
):
cfg.SOLVER.LR_SCHEDULER_NAME = 'WarmupPolyLR'
trainer = Trainer(cfg)
[11/20 16:16:06 d2.engine.defaults]: Model:
SemanticSegmentor(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(res2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(res3): Sequential(
(0): BasicBlock(
(shortcut): Conv2d(
64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv1): Conv2d(
64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(res4): Sequential(
(0): BasicBlock(
(shortcut): Conv2d(
128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv1): Conv2d(
128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(res5): Sequential(
(0): BasicBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv1): Conv2d(
256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(sem_seg_head): DeepLabV3PlusHead(
(decoder): ModuleDict(
(res2): ModuleDict(
(project_conv): Conv2d(
64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): SyncBatchNorm(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(fuse_conv): Sequential(
(0): Conv2d(
304, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(res5): ModuleDict(
(project_conv): ASPP(
(convs): ModuleList(
(0): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): Conv2d(
512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): Conv2d(
512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): Conv2d(
512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): Sequential(
(0): AvgPool2d(kernel_size=(16, 32), stride=1, padding=0)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
)
)
(project): Conv2d(
1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(fuse_conv): None
)
)
(predictor): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
(loss): DeepLabCE(
(criterion): CrossEntropyLoss()
)
)
)
[11/20 16:16:07 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [RandomCrop(crop_type='absolute', crop_size=[512, 1024]), ResizeShortestEdge(short_edge_length=(512, 768, 1024, 1280, 1536, 1792, 2048), max_size=4096, sample_style='choice'), RandomFlip()]
[11/20 16:16:07 d2.data.build]: Using training sampler TrainingSampler
[11/20 16:16:07 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common.NumpySerializedList'>
[11/20 16:16:07 d2.data.common]: Serializing 37 elements to byte tensors and concatenating them all ...
[11/20 16:16:07 d2.data.common]: Serialized dataset takes 0.01 MiB
[11/20 16:16:12 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone:
| Names in Model | Names in Checkpoint | Shapes |
|:------------------|:----------------------------------------------------------------------------------|:------------------------------------------|
| res2.0.conv1.* | res2.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.0.conv2.* | res2.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.1.conv1.* | res2.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.1.conv2.* | res2.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res3.0.conv1.* | res3.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,64,3,3) |
| res3.0.conv2.* | res3.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.0.shortcut.* | res3.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,64,1,1) |
| res3.1.conv1.* | res3.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.1.conv2.* | res3.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res4.0.conv1.* | res4.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,128,3,3) |
| res4.0.conv2.* | res4.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.0.shortcut.* | res4.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,128,1,1) |
| res4.1.conv1.* | res4.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.1.conv2.* | res4.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res5.0.conv1.* | res5.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,3,3) |
| res5.0.conv2.* | res5.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.0.shortcut.* | res5.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,1,1) |
| res5.1.conv1.* | res5.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.1.conv2.* | res5.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| stem.conv1.* | stem.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,3,7,7) |
[11/20 16:16:14 d2.engine.train_loop]: Starting training from iteration 0
ERROR [11/20 16:16:22 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 274, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/meta_arch/semantic_seg.py", line 108, in forward
features = self.backbone(images.tensor)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward
x = self.stem(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward
x = self.conv1(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 117, in forward
x = self.norm(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 731, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 867, in get_world_size
return _get_group_size(group)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 325, in _get_group_size
default_pg = _get_default_group()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 430, in _get_default_group
"Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
[11/20 16:16:22 d2.engine.hooks]: Total training time: 0:00:08 (0:00:00 on hooks)
[11/20 16:16:22 d2.utils.events]: iter: 0 lr: N/A max_mem: 7348M
GPU T4 x2
and GPU P100
:---------------------- -------------------------------------------------------------------------------
sys.platform linux
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0]
numpy 1.21.6
detectron2 0.6 @/opt/conda/lib/python3.7/site-packages/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.0
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.11.0 @/opt/conda/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1 Tesla T4 (arch=7.5)
Driver version 470.82.01
CUDA_HOME /usr/local/cuda
Pillow 9.1.1
torchvision 0.12.0 @/opt/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags 3.7, 6.0, 7.0, 7.5
fvcore 0.1.5.post20220512
iopath 0.1.9
cv2 4.5.4
---------------------- -------------------------------------------------------------------------------
PyTorch built with:
- GCC 9.4
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.0, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
Testing NCCL connectivity ... this should not hang.
NCCL succeeded.
---------------------- ----------------------------------------------------------------
sys.platform linux
Python 3.7.15 (default, Oct 12 2022, 19:14:55) [GCC 7.5.0]
numpy 1.21.6
detectron2 0.6 @/usr/local/lib/python3.7/dist-packages/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 11.2
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.12.1+cu113 @/usr/local/lib/python3.7/dist-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 Tesla T4 (arch=7.5)
Driver version 460.32.03
CUDA_HOME /usr/local/cuda
Pillow 7.1.2
torchvision 0.13.1+cu113 @/usr/local/lib/python3.7/dist-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20220512
iopath 0.1.9
cv2 4.6.0
---------------------- ----------------------------------------------------------------
PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.3.2 (built against CUDA 11.5)
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
It was due to very SyncBatchNorm
layers that I knew can bother on single GPU as a drawback in pytorch, but seemingly I was not able to find in the log!
Stated here as a workaround:
- print the config
- find all keys that has a value of "SyncBN" or similar
- Edit the config file or code to set these values to "BN" instead