Actually is converting a PyTorch CUDA project (https://github.com/suno-ai/bark) with DirectML for use my AMD GPU RX6700xt, I am having the problem RuntimeError: Cannot set version_counter for inference tensor
. I've Tried write to the developer but he says that he doesn't have experience with AMD.
I've changed all the .to(device)
to .to(dml)
of generation.py according to gpu-pytorch-windows Docs the files modifies are, generation.py
in bark
folder and build\lib\bark\
respectively. When I try run the project I see that the GPU started correctly but then I get the next error.
I really appreciate all the help you've given me so far. I was hoping you could help me out again. I've been reading a lot and trying different things, but I can't find much information on this error I'm getting. I don't know what to do or if I'm doing something wrong.
From here: https://github.com/suno-ai/bark/issues/271
python .\run.py
No GPU being used. Careful, inference might be very slow!
0%| | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
audio_array = generate_audio(text_prompt)
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 107, in generate_audio
semantic_tokens = text_to_semantic(
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 25, in text_to_semantic
x_semantic = generate_text_semantic(
File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 460, in generate_text_semantic
logits, kv_cache = model(
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 208, in forward
x, kv = block(x, past_kv=past_layer_kv, use_cache=use_cache)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 121, in forward
attn_output, prev_kvs = self.attn(self.ln_1(x), past_kv=past_kv, use_cache=use_cache)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\bark\bark\model.py", line 50, in forward
q, k ,v = self.c_attn(x).split(self.n_embd, dim=2)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\NoeXVanitasXJunk\miniconda3\envs\tfdml_plugin\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Cannot set version_counter for inference tensor
0%| | 0/100 [00:00<?, ?it/s]
I am in Python 3.9.16
Could you try help me with this?
Other thing: I've read that with torch-mlir it is possible to use an AMD card, but I'm not sure if it works on Windows. I tried it but I am not sure if I need something more or some DirectMl Special. I tried installing torch-mlir and it works in the project but it only uses the CPU not the GPU. I am not sure how to configure for using the GPU.
Update 2:
When I try to set mode=False
in interference
inference_mode()
was replaced by inference_mode(mode=False)
I get this error: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications
No GPU being used. Careful, inference might be very slow!
0%| | 0/100 [00:00<?, ?it/s]C:\Users\NoeXVanitasXJunk\bark\bark\model.py:80: UserWarning: The operator 'aten::tril.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal)
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32<00:00, 3.04it/s]
0%| | 0/31 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\NoeXVanitasXJunk\bark\run.py", line 13, in <module>
audio_array = generate_audio(text_prompt)
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 113, in generate_audio
out = semantic_to_waveform(
File "C:\Users\NoeXVanitasXJunk\bark\bark\api.py", line 54, in semantic_to_waveform
coarse_tokens = generate_coarse(
File "C:\Users\NoeXVanitasXJunk\bark\bark\generation.py", line 633, in generate_coarse
x_in = torch.hstack(
RuntimeError
Actually this bug was resolved thanks the user JonathanFly the which has working in the port and support of bark for DirectML with AMD GPU's, Now it works in windows.
https://github.com/JonathanFly/bark/tree/bark_amd_directml_test#-bark-amd-install-test
Thank you guys!!!