I am attempting to make a gradio demo for nanoLLaVA by @stablequan. I am porting over just the structure of Apache 2.0 licensed code in the Moondream repo.
The nanoLLaVA repo has example code in the repo, which I used to make this script. This works and provides a reasonable output. enter image description here When I use the same code but in gradio here, I get this error regarding a mismatch in devices.
Traceback (most recent call last):
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\blocks.py", line 1561, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\utils.py", line 678, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\Downloads\llm\nanollava\nanollava_gradio_demo.py", line 46, in answer_question
output_ids = model.generate(
^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 1575, in generate
result = self._sample(
^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 2697, in _sample
outputs = self(
^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 2267, in forward
) = self.prepare_inputs_labels_for_multimodal(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 687, in prepare_inputs_labels_for_multimodal
image_features = self.encode_images(images).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 661, in encode_images
image_features = self.get_model().mm_projector(image_features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Now if you've seen this before, and read through the error message, you might be able to tell me immediately, that "oh obviously its running on a different thread and set_default_device()
doesn't carry over to that thread". This issue here is related to this. I'm not sure which version the fix applies to. But either way, since default is "cpu", if everything is on cpu that should be fine right?
So I modified the AutoModelForCausalLM
call to use device_map='cpu'
, and printed the .device
of both inputs and the model before executing model.generate
:
<|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
What do you see?<|im_end|><|im_start|>assistant
cpu
cpu
cpu
So what you need to do is figure out a way to execute set_default_device on that thread, or as seen here, you can try set_default_tensor_type, WHICH WORKS!!
# set device
torch.set_default_device('cuda') # or 'cpu'
torch.set_default_tensor_type('torch.cuda.FloatTensor')