pytorchgpuamd-gpudirectml

RuntimeError: Cannot set version_counter for inference - Trying DirectML in AI Project for AMD


Actually is converting a PyTorch CUDA project (https://github.com/suno-ai/bark) with DirectML for use my AMD GPU RX6700xt, I am having the problem RuntimeError: Cannot set version_counter for inference tensor. I've Tried write to the developer but he says that he doesn't have experience with AMD.

I've changed all the .to(device) to .to(dml) of generation.py according to gpu-pytorch-windows Docs the files modifies are, generation.py in bark folder and build\lib\bark\ respectively. When I try run the project I see that the GPU started correctly but then I get the next error.

I really appreciate all the help you've given me so far. I was hoping you could help me out again. I've been reading a lot and trying different things, but I can't find much information on this error I'm getting. I don't know what to do or if I'm doing something wrong.

From here: https://github.com/suno-ai/bark/issues/271

python .\run.py
No GPU being used. Careful, inference might be very slow!
  0%|                                                                                                                                                | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
    audio_array = generate_audio(text_prompt)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 107, in generate_audio
    semantic_tokens = text_to_semantic(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 25, in text_to_semantic
    x_semantic = generate_text_semantic(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 460, in generate_text_semantic
    logits, kv_cache = model(
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 208, in forward
    x, kv = block(x, past_kv=past_layer_kv, use_cache=use_cache)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 121, in forward
    attn_output, prev_kvs = self.attn(self.ln_1(x), past_kv=past_kv, use_cache=use_cache)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 50, in forward
    q, k ,v  = self.c_attn(x).split(self.n_embd, dim=2)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Cannot set version_counter for inference tensor
  0%|                                                                                                                                                | 0/100 [00:00<?, ?it/s]

I am in Python 3.9.16

Could you try help me with this?

Other thing: I've read that with torch-mlir it is possible to use an AMD card, but I'm not sure if it works on Windows. I tried it but I am not sure if I need something more or some DirectMl Special. I tried installing torch-mlir and it works in the project but it only uses the CPU not the GPU. I am not sure how to configure for using the GPU.

Update 2:

When I try to set mode=False in interference inference_mode() was replaced by inference_mode(mode=False)

I get this error: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications

No GPU being used. Careful, inference might be very slow!
  0%|                                                                                          | 0/100 [00:00<?, ?it/s]C:\Users\NoeXVanitasXJunk\bark\bark\model.py:80: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
  y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal)
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32<00:00,  3.04it/s]
  0%|                                                                                           | 0/31 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
    audio_array = generate_audio(text_prompt)
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 113, in generate_audio
    out = semantic_to_waveform(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 54, in semantic_to_waveform
    coarse_tokens = generate_coarse(
  File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 633, in generate_coarse
    x_in = torch.hstack(
RuntimeError

Solution

  • Actually this bug was resolved thanks the user JonathanFly the which has working in the port and support of bark for DirectML with AMD GPU's, Now it works in windows.

    https://github.com/JonathanFly/bark/tree/bark_amd_directml_test#-bark-amd-install-test

    Thank you guys!!!