pythonmatplotlibprecisionaxis-labelsfloating-accuracy

imshow plotting very large integers, but "dtype object cannot be converted to float"


I have the following code, plotting a function on a grid, where the function happens to have a very large integer value:

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, FuncFormatter
import numpy as np # thanks to user @simon pointing out I had forgotten this

p = 13
counts = [[0 for x in range(p)] for y in range(p)]
counts[0][0] = 1000000000
unique_counts = np.unique(counts)
plt.imshow(counts, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
cbar = plt.colorbar(ticks=unique_counts, format=ScalarFormatter(useOffset=False))
cbar.ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: format(int(x), ',')))  # Format tick labels with commas
plt.show()

Running this in GoogleColab, it runs perfectly fine and gives the nice plot enter image description here

However, if I bump up counts[0][0] = 1000000000000000000000 say, then I get the following error:

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-12-0ec4c2551685> in <cell line: 8>()
      6 counts[0][0] = 100000000000000000000
      7 unique_counts = np.unique(counts)
----> 8 plt.imshow(counts, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
      9 cbar = plt.colorbar(ticks=unique_counts, format=ScalarFormatter(useOffset=False))
     10 cbar.ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: format(int(x), ',')))  # Format tick labels with commas

3 frames

/usr/local/lib/python3.10/dist-packages/matplotlib/image.py in set_data(self, A)
    699         if (self._A.dtype != np.uint8 and
    700                 not np.can_cast(self._A.dtype, float, "same_kind")):
--> 701             raise TypeError("Image data of dtype {} cannot be converted to "
    702                             "float".format(self._A.dtype))
    703 

TypeError: Image data of dtype object cannot be converted to float

I would like very much to be able to plot functions that take very large integer values with exact precision (so rounding/using floats would not be good). Is this possible?

EDIT: someone was understandably confused by this seemingly useless level of precision in a plot; I clarified that what's actually important for me is actually being able to read the exact value off the colorbar labels (for number theory applications, I need an exact count for the number of points on some varieties mod p). So I'm ok with the plot being slightly off, but I do really want the colorbar labels to be exact.


Solution

  • New answer

    (For my original answer, see the section below.)

    Based on the question's update, from which it became clear that the essential information that should be retained is the precise integer values on the colorbar tick labels, here is my updated answer. Its crucial idea is:

    Here is the corresponding code:

    import matplotlib.pyplot as plt
    from matplotlib.ticker import ScalarFormatter
    import numpy as np
    
    p = 13
    counts = [[0 for x in range(p)] for y in range(p)]
    # Provide some huge ints for demonstration purposes
    counts[ 0][ 0] = 100000000000000000008
    counts[ 0][-1] = counts[ 0][ 0] // 2
    counts[-1][ 0] = counts[ 0][-1] // 2
    counts[-1][-1] = counts[-1][ 0] // 2
    # Get the unique values (without Numpy, just to be sure)
    unique_counts = sorted(set(val for row in counts for val in row))
    # Provide the image and tick *positions* as float values to avoid casting error
    counts_img = np.array(counts, dtype=float)
    counts_ticks = [float(val) for val in unique_counts]
    # Provide the tick *labels* as strings generated from the original integer vals
    counts_ticks_labels = [f'{val:,}' for val in unique_counts]
    # Display everything
    plt.imshow(counts_img, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
    cbar = plt.colorbar(format=ScalarFormatter(useOffset=False))
    cbar.set_ticks(ticks=counts_ticks, labels=counts_ticks_labels)
    plt.show()
    

    In older versions of Matplotlib, you might need to adjust the last three lines as follows:

    cbar = plt.colorbar(ticks=counts_ticks, format=ScalarFormatter(useOffset=False))
    cbar.ax.set_yticklabels(counts_ticks_labels)
    plt.show()
    

    And here is the resulting plot: plot resulting from provided code

    Original answer

    Short answer

    I currently do not see a way to exactly pass huge integers to imshow(), due to the inner workings of Matplotlib relying on Numpy arrays for holding the image data. If you can live with approximate values, use

    counts[0][0] = float(100000000000000000000)
    

    Long answer

    The reason for the error that you see is that your nested list of image data is internally converted to a Numpy array by Matplotlib before displaying it. In Matplotlib's current version, this happens in cbook.safe_masked_invalid(), which is called by ā€Ž_ImageBase._normalize_image_array(), which is called by _ImageBase.set_data(), which is called by Axes.imshow().

    The chain of problems here is the following:

    1. Huge integers (i.e. integers that cannot be represented by Numpy's int_ data type, I assume) are converted to Numpy's object data type by default. This happens for your data with counts[0][0] = 100000000000000000000, but not with counts[0][0] = 1000000000. You can easily check the corresponding Numpy behavior as follows:

      str(np.array([100000000000000000000]).dtype)
      # >>> 'object'
      str(np.array([1000000000]).dtype)
      # >>> 'int64'
      

      In Matplotlib, as already mentioned, this happens in cbook.safe_masked_invalid(); more precisely, it happens in the line x = np.array(x, subok=True, copy=copy), where x refers to your nested list counts.

    2. After that, _ImageBase._normalize_image_array() checks whether the resulting array's data type is either uint8 or whether it can be cast to the float data type. Neither is true for Numpy's object data type, so the error is raised.

    To avoid this chain of problems, the only possibility that I see is converting your data to float values or to a float array yourself, once the values become too big, before passing them to imshow().