因为 每个 document 与query 都表示成了 tf-idf的vector,这些vector都在一个由vocabulary 中所有term作为axis 张成的空间上。
见老板5680 ppt上的例子,如果
Thought experiment: take a document d and
append it to itself. Call this document d′. 即把原来的doc copy + paste一遍跟在原来的document 后面
“Semantically” d and d′ have the same content
The Euclidean distance between the two
documents can be quite large
The angle between the two documents is 0,
corresponding to maximal similarity.
vector space model 就是用tf-idf 把document 与query 全部表示成 vector,在vocabulary张成的空间内