I am running StableDiffusionXLPipeline from the diffusers python package and am getting an output that is a png full of colorful noise, the png has dimensions 128x128.
The SDXL model I am referencing is pulled directly from HuggingFace with the goal of running locally. I am expecting to receive a picture of my prompt, which in this case is "A majestic Trex overlooking a jungle".
The command I used within the "stable_diffusion" folder to download the SDXL model is as follows:
git lfs clone https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
My code is as follows:
import torch
from diffusers import StableDiffusionXLPipeline
import torchvision.transforms as transforms
def main():
print(f"CUDA: {torch.cuda.is_available()}")
torch.cuda.empty_cache()
base_directory = "stable_diffusion/stable-diffusion-xl-base-1.0"
# Load base model
base = StableDiffusionXLPipeline.from_pretrained(
pretrained_model_name_or_path=base_directory,
# pretrained_model_or_path=base_directory, #for AutoPipeline
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
local_files_only=True,
cache_dir="stable_diffusion",
)
base.enable_model_cpu_offload()
base.enable_xformers_memory_efficient_attention()
base.enable_vae_slicing()
# Parameters
n_steps = 15
high_noise_frac = 0.8
prompt = "A majestic Trex overlooking a jungle"
# Generate base image
image = base(
prompt=prompt,
num_inference_steps=n_steps,
denoising_end=high_noise_frac,
output_type="latent",
).images[0]
# Clear GPU cache (again)
torch.cuda.empty_cache()
# Convert tensor to PIL Image
image_pil = transforms.ToPILImage()(image.cpu().squeeze(0))
# Save the image
image_pil.save("test_image.png")
#Cleanup
del base
del image
del image_pil
if __name__ == "__main__":
main()
So far I have confirmed that CUDA is enabled and working (checked with torch.cuda.is_available() and through CUDA usage in task manager.
I have attempted saving the image using the documentation recommended image.save("test.png") but am getting an AttributeError that says "'Tensor' object has no attribute 'save'. Did you mean: 'ravel'?".
I have attempted different classes from the diffusers package like StableDiffusionPipeline, with no luck.
Lastly I tried changing the n_steps and noise fraction, same noisy colorful output
Attached below is my current list of packages in python 3.10.4:
accelerate==0.22.0
aiohttp==3.8.5
aiosignal==1.3.1
altgraph==0.17.3
appdirs==1.4.4
art==6.0
async-timeout==4.0.3
attrs==23.1.0
audioread==3.0.0
Brotli==1.0.9
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
click==8.1.7
colorama==0.4.6
decorator==4.4.2
diffusers==0.20.2
docker-pycreds==0.4.0
filelock==3.12.3
frozenlist==1.4.0
fsspec==2023.9.0
gitdb==4.0.10
GitPython==3.1.36
google-api-core==2.11.1
google-auth==2.22.0
google-cloud==0.34.0
google-cloud-core==2.3.3
google-cloud-speech==2.21.0
google-cloud-storage==2.10.0
google-crc32c==1.5.0
google-resumable-media==2.5.0
googleapis-common-protos==1.60.0
grpcio==1.57.0
grpcio-status==1.57.0
huggingface-hub==0.17.1
idna==3.4
imageio==2.31.1
imageio-ffmpeg==0.4.8
importlib-metadata==6.8.0
Jinja2==3.1.2
MarkupSafe==2.1.2
moviepy==1.0.3
mpmath==1.2.1
multidict==6.0.4
mutagen==1.46.0
networkx==3.0
numpy==1.25.2
openai==0.27.8
packaging==23.1
pathtools==0.1.2
pefile==2023.2.7
Pillow==10.0.0
pocketsphinx==5.0.2
proglog==0.1.10
proto-plus==1.22.3
protobuf==4.24.0
psutil==5.9.5
setproctitle==1.3.2
six==1.16.0
smmap==5.0.1
sounddevice==0.4.6
soundfile==0.12.1
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.1+cu117
torchaudio==2.0.2+cu117
torchvision==0.15.2+cu117
tqdm==4.66.1
transformers==4.33.1
typing_extensions==4.7.1
urllib3==1.26.16
wandb==0.15.10
websockets==11.0.3
xformers==0.0.21
yarl==1.9.2
youtube-dl==2021.12.17
yt-dlp==2023.7.6
zipp==3.16.2
And here is a picture of the current output: Output of the current StableDiffusionXL model
Get rid of output_type="latent"
. I don't even see that option in the documentation for StableDiffusionXLPipeline but we can guess what it does from the name — it returns the latent representation instead of the image. The default is output_type="pil"
, which will return a PIL.Image
. That will also solve your problem of image.save
not working.