pandasmarkdowntabulate

pandas.DataFrame.to_markdown transform large int to float


pandas.DataFrame.to_markdown transforms large int to float. Is it a bug or a feature? Are there any solutions?

>>> df = pd.DataFrame({"A": [123456, 123456]})
>>> print(df.to_markdown())
|    |      A |
|---:|-------:|
|  0 | 123456 |
|  1 | 123456 |

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> print(df.to_markdown())
|    |           A |
|---:|------------:|
|  0 | 1.23457e+06 |
|  1 | 1.23457e+06 |

>>> print(df)
         A
0  1234567
1  1234567

>>> print(df.A.dtype)
int64

Solution

  • I initially found only a workaround, but not the explanation: converting the column to strings.

    >>> df = pd.DataFrame({"A": [1234567, 1234567]})
    >>> df["A"] = df.A.astype(str)
    >>> print(df.to_markdown())
    |    |       A |
    |---:|--------:|
    |  0 | 1234567 |
    |  1 | 1234567 |
    

    Update:

    I think it is caused by 2 elements:

    def _column_type(strings, has_invisible=True, numparse=True):
        """The least generic type all column values are convertible to.
    

    It can be solved by disabling the conversion via tablefmt="pretty":

    print(df.to_markdown(tablefmt="pretty"))
    +---+---------+
    |   |    A    |
    +---+---------+
    | 0 | 1234567 |
    | 1 | 1234567 |
    +---+---------+
    
    >>> df = pd.DataFrame({"A": [1234567, 1234567], "B": [0.1, 0.2]})
    >>> print(df)
             A    B
    0  1234567  0.1
    1  1234567  0.2
    
    >>> print(df.A.dtype)
    int64
    
    >>> print(df.to_markdown(tablefmt="pretty"))
    +---+-----------+-----+
    |   |     A     |  B  |
    +---+-----------+-----+
    | 0 | 1234567.0 | 0.1 |
    | 1 | 1234567.0 | 0.2 |
    +---+-----------+-----+
    
    >>> df.values
    array([[1.234567e+06, 1.000000e-01],
           [1.234567e+06, 2.000000e-01]])