pythonpython-3.xhuggingface-transformerskaggle

error while running hugging face models on kaggle notebook


I am using the Llama3 model from the huggingface library on a kaggle notebook and am facing this error on running the pipeline module I have trimmed out a major chunk of the stack trace because otherwise posting the question was not allowed with all that code and no description.

RuntimeError                              Traceback (most recent call last)
Cell In[19], line 17, in Llama_Chat(system_role, user_msg)
     12 def Llama_Chat(system_role,user_msg):
     13   messages = [
     14     {"role": "system", "content": system_role},
     15     {"role": "user", "content": user_msg},
     16   ]
---> 17   outputs = pipeline(
     18       messages,
     19       max_new_tokens=256,
     20       temperature = 0.1
     21 
     22   )
     24   reply=outputs[0]["generated_text"][-1]["content"]
     25   return reply

File /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:169, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
    167         output = module._old_forward(*args, **kwargs)
    168 else:
--> 169     output = module._old_forward(*args, **kwargs)
    170 return module._hf_hook.post_forward(module, output)

File /opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:603, in LlamaSdpaAttention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, cache_position, position_embeddings, **kwargs)
    599 # We dispatch to SDPA's Flash Attention or Efficient kernels via this `is_causal` if statement instead of an inline conditional assignment
    600 # in SDPA to support both torch.compile's dynamic shapes and full graph options. An inline conditional prevents dynamic shapes from compiling.
    601 is_causal = True if causal_mask is None and q_len > 1 else False
--> 603 attn_output = torch.nn.functional.scaled_dot_product_attention(
    604     query_states,
    605     key_states,
    606     value_states,
    607     attn_mask=causal_mask,
    608     dropout_p=self.attention_dropout if self.training else 0.0,
    609     is_causal=is_causal,
    610 )
    612 attn_output = attn_output.transpose(1, 2).contiguous()
    613 attn_output = attn_output.view(bsz, q_len, -1)

RuntimeError: cutlassF: no kernel found to launch!

this is the error i am facing while running huggingface models using the transformers library in kaggle .. i have checked the versions of cuda , pytorch they are fine ChatGpt ,Claude etc. are all suggesting the version mismatch , but i am making no progress


Solution

  • Try setting the following backends to False.

    torch.backends.cuda.enable_mem_efficient_sdp(False)
    torch.backends.cuda.enable_flash_sdp(False)
    

    Source