pythontorchtorchvision

comfyui: python torch/resizing image - height adjusts width and width adjusts color?


First time here, I am new to python and trying to resize an image for a ComfyUI node using python

import torch
import torchvision.transforms.functional as TF
from PIL import Image

class ImageResize:
    def __init__(self):
        pass

    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "image": ("IMAGE",),
                "height": ("INT", {"min": 1}),
                "width": ("INT", {"min": 1}),
            },
        }

    RETURN_TYPES = ("IMAGE",)
    FUNCTION = "resize_image"
    CATEGORY = "ImageProcessing"

    def resize_image(self, image, height, width):
        input_image = image
        resized_image = TF.resize(input_image, (height, width))
        return (resized_image,)

Using the above, when I adjust the height, the width changes, and when I adjust the width,

int of 1 shows cannot handle this data type: (1, 1, 1), |u1

int of 2 shows a black and white image

int of 3 shows a full colour image

int of 4 shows a dimmed colour image

5 or higher throws the same error as 1 with the last value incrementing to 5 or higher ie: Cannot handle this data type: (1, 1, 5), |u1

What am I doing wrong?

not sure why height is adjusting the width and the width doing something completely different


Solution

  • The problem here is that torchvision and ComfyUI use different memory layouts when handling image data. When representing colour images as multidimensional arrays, there are two commonly used approaches: channel-last format (image arrays are of shape [H, W, C]) used in libraries like PIL, opencv and tensorflow, and channel-first format ([C, H, W]) used mostly by torch and torchvision.

    If we look at the implementation for LoadImage, we see that the images are opened with PIL.Image.open, so the resulting array will be in channel-last format. In your custom node, you are using the resize operator of torchvision, which expects its inputs to be channel-first.

    The resize operation itself just an interpolation, and can handle inputs of arbitrary shape. In your case, however, when you pass an image with the wrong memory layout, trying to change the height will change the width, and trying to change the width will change the channels. Hence the weird colours you are seeing - resizing the "width" to 4 will add an alpha (transparency) channel to the image making it appear dimmed.

    For the desired behaviour, you need to modify the image to match the expected input shape of torchvision.Resize, e.g. transpose the array to be channel-first.

    # note that this example expects images to always have a batch
    # dimension and a colour channel
    def resize_image(self, image, height, width):
        # [B, H, W, C] -> [B, C, H, W]
        input_image = image.permute((0, 3, 1, 2))
        resized_image = TF.resize(input_image, (height, width))
        # [B, C, H, W] -> [B, H, W, C]
        resized_image = resized_image.permute((0, 2, 3, 1))
        return (resized_image,)
    

    This is a good example why the full stack trace should be provided when asking questions, as the underlying problem was not in your implementation of the node, but in the discrepancy in how ComfyUI and torchvision handle images.