본문 바로가기
AI/Computer Vision Materials

8.X TF-IDF

by 쵸빙 2020. 7. 31.

In this lecture, we will learn about TF-IDF.

 

 

 

As we learned in last lecture, Vector Space Model (aka Bag-of-Words) works as below.

Vector Space Model

A document (datapoint) is a vector of counts over each word (feature)

Vd is just a histogram over words.

 

n( · ) counts the number of occurences.

 

 

 

 

What is the similarity between two documents?

We can use any distance but the cosine distance is fast.

cosine distance

 

But not all words are created equall.

 

 

 

 

● TF-IDF

: Term Frequency Inverse Document Frequency

 

We weigh each word by a heuristic.

 

 

 

So we can count the words by TF-IDF.

'AI > Computer Vision Materials' 카테고리의 다른 글

1.2 Application of computer vision  (0) 2020.07.31
1.1 What is Computer Vision?  (0) 2020.07.31
8.2 BoW Classification  (0) 2020.07.30
8.1 Bag of Visual Words  (0) 2020.07.30
8.0 Image classification  (0) 2020.07.30