1 Google的海量数据:分词、排序、索引;
2 Google文件系统:GFS (注实现冗余控制)
3 Google 数据存储:Bigtable
4 Google的核心算法:Page Rank(为百亿网页做价值评分)
q = Gq
注:q is PageRank vector
G= αS + ( 1 – α )U/n
Sis the destination-by-stochastic matrix
Uis all one matrix
nis the number of nodes
αis the weight between 0 and 1 ( e.g. 0.85 )
Algorithm : lterative power for finding thefirst eigenvector
qn+1 = Gqn
当n 是亿级别的时候任何一台超级计算机都无法完成该计算,就有了下面的map reduce
5 Map-reduce思想:矩阵相乘就是行列分别相乘,把计算发布到一万个pc节点上。(一万个屌丝就是一个高富帅)