[SOLVED] How to build a normalized tf dataframe?

How to build a normalized tf dataframe?

I want to apply this into my tf function. But unable to build the function.

My dataset looks like this

I have tried to buield the function like this

def term_document_matrix(data, vocab_list = None, doc_index= 'ID', text= 'text'):
      tf_matirx = pd.DataFrame(columns=df[document_index], index= vocab).fillna(0)
    a = int(input("enter the value"))
    for word in tf_matrix.index:
    
    for doc in data[document_index]:
        
        result = a + (1-a)*[data[data[document_index] == doc][text].values[0].count(word)/X]
        X = ????????
        tf_matrix.loc[word,doc] = result
return tf_matrix

But unable to build this completely.

Here parameters are described as below

parameter:

    data: DataFrame. 
    Frequency of word calculated against the data.
    
    vocab_list: list of strings.
    Vocabulary of the documents    
    
    doc_index: str.
    Column name for document index in DataFrame passed.
    
    text: str
    Column name containing text for all documents in DataFrame,
    
returns:
    tf_matrix: DataFrame.
    DataFrame containing term document matrix.
    """

My goal is to get a dataframe like this

Solution

You can determine tf dataframe by using CountVectorizer. Then divide each value by max value of it's column and repeat this process for every column in your dataframe

 df_1st = df.apply(lambda col: col / col.max())

and then just multiply and add a scaler for each element in your dataframe.

df_2nd = df_1st.apply(lambda col: lambda + col*(1-lambda))
tf_matrix = df_2nd