pythonpytorchpipelinehuggingfacestable-diffusion

Locally ran Python StableDiffusionXL outputting noisy image


I am running StableDiffusionXLPipeline from the diffusers python package and am getting an output that is a png full of colorful noise, the png has dimensions 128x128.

The SDXL model I am referencing is pulled directly from HuggingFace with the goal of running locally. I am expecting to receive a picture of my prompt, which in this case is "A majestic Trex overlooking a jungle".

The command I used within the "stable_diffusion" folder to download the SDXL model is as follows:

git lfs clone https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

My code is as follows:

import torch
from diffusers import StableDiffusionXLPipeline
import torchvision.transforms as transforms

def main():
    print(f"CUDA: {torch.cuda.is_available()}")

    torch.cuda.empty_cache()
    base_directory = "stable_diffusion/stable-diffusion-xl-base-1.0"

    # Load base model
    base = StableDiffusionXLPipeline.from_pretrained(
        pretrained_model_name_or_path=base_directory,
        # pretrained_model_or_path=base_directory,  #for AutoPipeline
        torch_dtype=torch.float16,
        variant="fp16",
        use_safetensors=True,
        local_files_only=True,
        cache_dir="stable_diffusion",
    )
    base.enable_model_cpu_offload()
    base.enable_xformers_memory_efficient_attention()
    base.enable_vae_slicing()

    # Parameters
    n_steps = 15
    high_noise_frac = 0.8
    prompt = "A majestic Trex overlooking a jungle"

    # Generate base image
    image = base(
        prompt=prompt,
        num_inference_steps=n_steps,
        denoising_end=high_noise_frac,
        output_type="latent",
    ).images[0]

    # Clear GPU cache (again)
    torch.cuda.empty_cache()

    # Convert tensor to PIL Image
    image_pil = transforms.ToPILImage()(image.cpu().squeeze(0))

    # Save the image
    image_pil.save("test_image.png")

    #Cleanup
    del base
    del image
    del image_pil

if __name__ == "__main__":
    main()

Attached below is my current list of packages in python 3.10.4:

accelerate==0.22.0
aiohttp==3.8.5  
aiosignal==1.3.1  
altgraph==0.17.3  
appdirs==1.4.4  
art==6.0  
async-timeout==4.0.3  
attrs==23.1.0  
audioread==3.0.0  
Brotli==1.0.9  
cachetools==5.3.1  
certifi==2023.7.22  
cffi==1.15.1  
charset-normalizer==3.2.0
click==8.1.7  
colorama==0.4.6  
decorator==4.4.2  
diffusers==0.20.2  
docker-pycreds==0.4.0  
filelock==3.12.3  
frozenlist==1.4.0
fsspec==2023.9.0
gitdb==4.0.10
GitPython==3.1.36
google-api-core==2.11.1
google-auth==2.22.0
google-cloud==0.34.0
google-cloud-core==2.3.3
google-cloud-speech==2.21.0
google-cloud-storage==2.10.0
google-crc32c==1.5.0
google-resumable-media==2.5.0
googleapis-common-protos==1.60.0
grpcio==1.57.0
grpcio-status==1.57.0
huggingface-hub==0.17.1
idna==3.4
imageio==2.31.1
imageio-ffmpeg==0.4.8
importlib-metadata==6.8.0
Jinja2==3.1.2
MarkupSafe==2.1.2
moviepy==1.0.3
mpmath==1.2.1
multidict==6.0.4
mutagen==1.46.0
networkx==3.0
numpy==1.25.2
openai==0.27.8
packaging==23.1
pathtools==0.1.2
pefile==2023.2.7
Pillow==10.0.0
pocketsphinx==5.0.2
proglog==0.1.10
proto-plus==1.22.3
protobuf==4.24.0
psutil==5.9.5
setproctitle==1.3.2
six==1.16.0
smmap==5.0.1
sounddevice==0.4.6
soundfile==0.12.1
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.1+cu117
torchaudio==2.0.2+cu117
torchvision==0.15.2+cu117
tqdm==4.66.1
transformers==4.33.1
typing_extensions==4.7.1
urllib3==1.26.16
wandb==0.15.10
websockets==11.0.3
xformers==0.0.21
yarl==1.9.2
youtube-dl==2021.12.17
yt-dlp==2023.7.6
zipp==3.16.2

And here is a picture of the current output: Output of the current StableDiffusionXL model


Solution

  • Get rid of output_type="latent". I don't even see that option in the documentation for StableDiffusionXLPipeline but we can guess what it does from the name — it returns the latent representation instead of the image. The default is output_type="pil", which will return a PIL.Image. That will also solve your problem of image.save not working.