pythonpandasdataframereshapelreshape

reshape a pandas dataframe


suppose a dataframe like this one:

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

enter image description here

I would like to have a dataframe which looks like:

enter image description here

what does not work:

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

of course I could loop over the data and make a new list of list but there must be a better way. Any ideas ?


Solution

  • The pd.wide_to_long function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

    The only issue with pd.wide_to_long is that it must have an identification variable, i, unlike melt. reset_index is used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.

    df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
    pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
      .reset_index()[['A', 'B', 'id']]
    
        A   B id
    0   1   2  1
    1   5   6  1
    2   9  10  1
    3   3   4  2
    4   7   8  2
    5  11  12  2