0. 动机

“实体的含义，以及实体之间语法关系的含义和这些实体与其他实体之间组合方式的限制有关。”
(The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities relative to other entities.)

1. 向量空间模型与相似度计算

1.1 向量空间模型

(0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0)

1.2 基本语义相似度计算

a)       欧氏向量空间

b)      概率向量空间

a)       原型(prototype)方法

b)      范例方法

4. 未尽事宜

[题注] 本文旨在向不了解词语语义相似度计算的同学简要介绍该领域中的基本概念与方法，以及作者自己对其中一些基本问题的理解。在行家里手眼里，就未免显得太粗浅了。

[1]依个人见解不同，不排除有异议的可能。

[2]当然这个词要足够关键才行，我们在这里暂不讨论这个问题。

[3]这种方法又叫做“多原型”方法。

[4]范例方法的情况留给读者自己发挥。

[5]权重计算的选择并不影响本质。

`i` Zellig. S.Harris. 1968. Mathematical Structures of Language. Wiley, New York, NY, USA.

`ii` Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT ‘10). Association for Computational Linguistics, Stroudsburg, PA, USA, 109-117.

`iii` Patrick Andre Pantel. 2003. Clustering by Committee. Ph.D. Dissertation. University of Alberta, Edmonton, Alta., Canada. Advisor(s) Dekang Lin. AAINQ82151.

`iv` Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics – Volume 2 (ACL ‘98), Vol. 2. Association for Computational Linguistics, Stroudsburg, PA, USA, 768-774. DOI=10.3115/980691.980696 http://dx.doi.org/10.3115/980691.980696

`v`Katrin Erk and Sebastian Padó. 2010. Exemplar-based models for word meaning in context. In Proceedings of the ACL 2010 Conference Short Papers (ACLShort ‘10). Association for Computational Linguistics, Stroudsburg, PA, USA, 92-97.

`vi`Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of ACL-08: HLT. 236-244

`vii` Georgiana Dinu and Mirella Lapata. 2010. Measuring distributional similarity in context. InProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing(EMNLP ‘10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1162-1172.

`viii` Diarmuid Ó Séaghdha and Anna Korhonen. 2011. Probabilistic models of similarity in syntactic context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP ‘11). Association for Computational Linguistics, Stroudsburg, PA, USA, 1047-1057.

`ix` T im Van de Cruys, Thierry Poibeau, and Anna Korhonen. 2011. Latent vector weighting for word meaning in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ‘11). Association for Computational Linguistics, Stroudsburg, PA, USA, 1012-1022

by sunshuqi