Adding leading zeros to data columns when loading from CSV files using pandas

I have a script to load and combine time-series data from two .csv files that have the same base filename (specified as a path using Pathlib), but different suffixes. Given that the importing of the data from the .csv files is working correctly, I include here a minimal working example with data provided using StringIO:

import pandas as pd
from io import StringIO

def load_data():
    headers_0 = ['a', 'b', 'c']  # Headers for first file. May have more entries than columns in file
    headers_1 = ['d', 'e']       # Headers for second file.

    data_0 = pd.read_csv(
        StringIO(
          """
            0 1 2
            3 4 5
            6 7 8
          """
          )
        , header=None, delim_whitespace=True)
    data_0.columns = headers_0[0:data_0.shape[1]]

    data_1 = pd.read_csv(
        StringIO(
          """
            A B
            C D
          """
          )
        , header=None, delim_whitespace=True)
    data_1.columns = headers_1[0:data_1.shape[1]]

    data = data_0.join(data_1)
    data.fillna(0, inplace=True)

    print(data)

Thus far, I have only been using load_data for datasets where both data_0 and data_1 have the same length of columns (same length of time-series). However, I am now encountering a situation where data_1 has a shorter column length than data_0; this is because the data in data_1 only starts getting recorded at some later time than data_0.

How do I use pandas to fill the columns of data_1 with leading zeros, such that the column length in both data_0 and data_1 is the same? I believe that the line data.fillna(0, inplace=True) is filling the length mismatch with trailing zeros; is there an obvious way to change this to leading zeros? Note that I do not know the length of either dataset a priori, so I would appreciate help towards a solution that works based on the length of the data loaded using pandas. Running the example above gives an output

   a  b  c  d  e
0  0  1  2  A  B
1  3  4  5  C  D
2  6  7  8  0  0

which is evidently not the desired effect (the zeros in columns d and e would be in row 0, rather than 2).

I have tried different options for DataFrame.fillna such as method=backfill, but none of these attempts have yielded the expected result.

Solution

Here's one approach:

Reverse row order of your dfs with [::-1] and apply df.reset_index with drop=True
Apply pd.concat on axis=1, and reverse + reset_index again
Finally, apply df.fillna

import pandas as pd
from io import StringIO

def load_data(filename):
    
    headers_0 = ['a', 'b', 'c']
    headers_1 = ['d', 'e']

    # using StringIO for mre 
    # + `sep="\s+"` for `delim_whitespace=True` (deprecated since 2.2.0)
    data_0 = pd.read_csv(StringIO(c_0), header=None, sep="\s+")
    data_0.columns = headers_0[0:data_0.shape[1]]

    data_1 = pd.read_csv(StringIO(c_1), header=None, sep="\s+")
    data_1.columns = headers_1[0:data_1.shape[1]]

    data = (
        pd.concat(
            [df[::-1].reset_index(drop=True) for df in [data_0, data_1]], 
            axis=1
        )[::-1]
        .reset_index(drop=True)
        .fillna(0)
    )

    return data

c_0 = """0 1 2
3 4 5
6 7 8"""

c_1 = """A B
C D
"""

load_data('c')

   a  b  c  d  e
0  0  1  2  0  0
1  3  4  5  A  B
2  6  7  8  C  D

Will work regardless of which one is longer.

Another option could be to use df.shift with fill_value:

def load_data2(filename):
    
    # `data_0` and `data_1` same as above

    diff = len(data_0) - len(data_1)
    
    data = pd.concat([data_0, data_1], axis=1)
    if diff > 0:
        data[data_1.columns] = data[data_1.columns].shift(diff, fill_value=0)
    elif diff < 0:
        data[data_0.columns] = data[data_0.columns].shift(abs(diff), fill_value=0)
    
    return data

The nice thing about this option is that it will only fill NaN values in the shifted area. E.g.:

c_0 = """0 1 2
3 4 5
6 7 8"""

c_1 = """A B
C 
"""

load_data2('c')

   a  b  c  d    e
0  0  1  2  0    0
1  3  4  5  A    B
2  6  7  8  C  NaN # with my `load_data` above, this `NaN` gets filled too!

Of course, if you want all NaN values filled anyway, just use fillna here as well.