因为 每个 document 与query 都表示成了 tf-idf的vector,这些vector都在一个由vocabulary 中所有term作为axis 张成的空间上。


见老板5680 ppt上的例子,如果 

Thought experiment: take a document d and 
append it to itself. Call this document d′. 即把原来的doc copy + paste一遍跟在原来的document 后面

“Semantically” d and d′ have the same content

The Euclidean distance between the two 
documents can be quite large

The angle between the two documents is 0, 
corresponding to maximal similarity.


vector space model 就是用tf-idf 把document 与query 全部表示成 vector,在vocabulary张成的空间内