In this lecture, we will learn about TF-IDF.
As we learned in last lecture, Vector Space Model (aka Bag-of-Words) works as below.
A document (datapoint) is a vector of counts over each word (feature)
Vd is just a histogram over words.
n( · ) counts the number of occurences.
What is the similarity between two documents?
We can use any distance but the cosine distance is fast.
But not all words are created equall.
● TF-IDF
: Term Frequency Inverse Document Frequency
We weigh each word by a heuristic.
So we can count the words by TF-IDF.
'AI > Computer Vision Materials' 카테고리의 다른 글
1.2 Application of computer vision (0) | 2020.07.31 |
---|---|
1.1 What is Computer Vision? (0) | 2020.07.31 |
8.2 BoW Classification (0) | 2020.07.30 |
8.1 Bag of Visual Words (0) | 2020.07.30 |
8.0 Image classification (0) | 2020.07.30 |