pythonpandas

Convert column to string, retaining NaN (as None or blank)


I would like to format a bunch of numbers in a list. The easiest way to do this is to convert it first to a bunch of strings. Here's an example of how I'm doing this:

df[col_name].astype('str').tolist()

However, the issue with this is I get values such as:

['12.19', '13.99', '1.00', 'nan', '9.00']

Is there a way I can return the 'nan' values as either None or an empty string, for example:

['12.19', '13.99', '1.00', None, '9.00']

Or:

['12.19', '13.99', '1.00', '', '9.00']

How would I do these two?


Solution

  • You can try like this.

    1st way:

    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None).tolist()
    ['12.19', '13.99', '1.00', None, '9.00']
    >>>
    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '').tolist()
    ['12.19', '13.99', '1.00', '', '9.00']
    >>>
    

    2nd way:

    >>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else None).tolist()
    ['12.19', '13.99', '1.00', None, '9.00']
    >>>
    >>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else '').tolist()
    ['12.19', '13.99', '1.00', '', '9.00']
    >>>
    

    Here is the detailed explanation.

    >>> import pandas as pd
    >>> import numpy as np
    >>>
    >>> df = pd.DataFrame({
    ... "fullname": ['P Y', 'P T', 'T Y', 'N A', 'P Z'],
    ... "age": [36, 80, 25, 8, 34],
    ... "salary": ['12.19', '13.99', '1.00', np.nan, '9.00']
    ... })
    >>>
    >>> df
      fullname  age salary
    0      P Y   36  12.19
    1      P T   80  13.99
    2      T Y   25   1.00
    3      N A    8    NaN
    4      P Z   34   9.00
    >>>
    >>> # PROBLEM
    ...
    >>> col_name = "salary"
    >>> df[col_name].astype("str").tolist()
    ['12.19', '13.99', '1.00', 'nan', '9.00']
    >>>
    >>> # SOLUTION
    ...
    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None)
    0    12.19
    1    13.99
    2     1.00
    3     None
    4     9.00
    Name: salary, dtype: object
    >>>
    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '')
    0    12.19
    1    13.99
    2     1.00
    3
    4     9.00
    Name: salary, dtype: object
    >>>
    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None).tolist()
    ['12.19', '13.99', '1.00', None, '9.00']
    >>>
    >>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '').tolist()
    ['12.19', '13.99', '1.00', '', '9.00']
    >>>
    >>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else None).tolist()
    ['12.19', '13.99', '1.00', None, '9.00']
    >>>
    >>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else '').tolist()
    ['12.19', '13.99', '1.00', '', '9.00']
    >>>