pythonscikit-learntf-idftfidfvectorizer

What does tfidfvectorizer.transform() actually produce?


I am new to using tf-idf vectorizer. While running the code I came up with this output but was not able to interpret what it actually means.

Code

X=["Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once its opened. These modes also define the location of the File Handle in the file.","File handle is like a cursor, which defines from where the data has to be read or written in the file. There are 6 access modes in python."]

X = np.array(X)

ans = tfidfvectorizer.transform(X)

print(ans)

**OUTPUT**

  (0, 247682)   0.34757472043242427

  (0, 235525)   0.11981132543319443

  (0, 232967)   0.27278177118815816

  (0, 165607)   0.6769351735727495

  (1, 247953)   0.2657562514567408

  (1, 232967)   0.2589999033874122

  (1, 230813)   0.28434013277955594

  (1, 202607)   0.22380408029504645

Can anyone tell what (0,247682) and (1,247953) mean?


Solution

  • Firstly there are two sentences in your data set. Each word found in these sentences will be assigned a word id.

    In (0,247682):

    0 is the document id or first sentence, 247682 is the word id, and 0.34757472043242427 is its TF-IDF score