pythonpandasdataframemelt

Pandas DataFrame stack multiple column values into single column


Assuming the following DataFrame:

  key.0 key.1 key.2  topic
1   abc   def   ghi      8
2   xab   xcd   xef      9

How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:

   topic  key
1      8  abc
2      8  def
3      8  ghi
4      9  xab
5      9  xcd
6      9  xef

Note that the number of key.N columns is variable on some external N.


Solution

  • You can melt your dataframe:

    >>> keys = [c for c in df if c.startswith('key.')]
    >>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')
    
       topic variable  key
    0      8    key.0  abc
    1      9    key.0  xab
    2      8    key.1  def
    3      9    key.1  xcd
    4      8    key.2  ghi
    5      9    key.2  xef
    

    It also gives you the source of the key.


    From v0.20, melt is a first class function of the pd.DataFrame class:

    >>> df.melt('topic', value_name='key').drop('variable', axis=1)
    
       topic  key
    0      8  abc
    1      9  xab
    2      8  def
    3      9  xcd
    4      8  ghi
    5      9  xef