pythonbioinformaticsrosalind

Count frequency of values in pandas DataFrame


Having this pandas.core.frame.DataFrame:

Gorilla     A  T  C  C  A  G  C  T
Dog         G  G  G  C  A  A  C  T
Humano      A  T  G  G  A  T  C  T
Drosophila  A  A  G  C  A  A  C  C
Elefante    T  T  G  G  A  A  C  T
Mono        A  T  G  C  C  A  T  T
Unicornio   A  T  G  G  C  A  C  T

I would like to get a data frame like that:

    A   5 1 0 0 5 5 0 0
    C   0 0 1 4 2 0 6 1
    G   1 1 6 3 0 1 0 0
    T   1 5 0 0 0 1 1 6 

Basically, what I want is to count the frequent column by column and create the second df as I show.

I want to do this because finally, I would like to get a Consensus string. Should be something like that A T G C A A C T

Could anyone help me or give me some advice?


Solution

  • Try:

    result = df.apply(pd.value_counts).fillna(0)
    
       col1  col2  col3  col4  col5  col6  col7  col8
    A   5.0   1.0   0.0   0.0   5.0   5.0   0.0   0.0
    C   0.0   0.0   1.0   4.0   2.0   0.0   6.0   1.0
    G   1.0   1.0   6.0   3.0   0.0   1.0   0.0   0.0
    T   1.0   5.0   0.0   0.0   0.0   1.0   1.0   6.0