[SOLVED] Runtime Error: StableCascadeCombinedPipeline: Expected all tensors to be on the same device

Runtime Error: StableCascadeCombinedPipeline: Expected all tensors to be on the same device

In a nutshell: Attempting to pass an image into StableCascadeCombinedPipeline gives a runtime error complaining about tensors not all being in cuda. The app works perfectly if I comment out the image argument so that it relies only on the text prompt, i.e. as a text-to-image generator.

A gist of the app code (60 lines of python) with the image input commented out is visible here

The doc for the pipeline defines the optional image argument as: images (torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image], optional) — The images to guide the image generation for the prior. I'm passing a PIL.Image.Image by way of a Gradio Image Component.

Since the app runs without passing an image, it seems like I need, somehow, to ensure that the image ends up in cuda, but so far I haven't found any instructions for how to do that.

Here's the part of the code that sets up the pipeline and defines the generate function:

# Constants
repo = "stabilityai/stable-cascade"


# Ensure model and scheduler are initialized in GPU-enabled function
if torch.cuda.is_available():
    pipe = StableCascadeCombinedPipeline.from_pretrained(repo, variant="bf16", torch_dtype=torch.bfloat16)
    pipe.to("cuda")

# The generate function
@spaces.GPU(enable_queue=True)
def generate_image(prompt):  
#def generate_image(prompt, images):  
    seed  =  random.randint(-100000,100000)

    results =  pipe(
                prompt=prompt,
                #images=[images],
                height=1024,
                width=1024,
                num_inference_steps=20, 
                generator=torch.Generator(device="cuda").manual_seed(seed)
            )
    return results.images[0]

Solution

The issue turned out to be that pipe.to('cuda') doesn't move a needed component, the prior image encoder, to cuda. The problem is resolved by adding:

pipe.prior_image_encoder.to('cuda')

A complete, working app.py and requirements.txt is posted at https://github.com/huggingface/diffusers/issues/7598#issuecomment-2042897916