基于邻域的算法是个性化推荐系统中最基本的算法,该算法不仅在学术界得到了深入研究,而且在业界得到了广泛应用。基于邻域的算法分为两大类,一类是基于用户的协同过滤算法,另一类是基于物品的协同过滤算法。本文主要研究基于物品的协同过滤算法和基于ALS协同过滤算法。

一、基于物品的协同过滤算法

1.基本思想

ItemCF算法通过计算用户的历史行为记录,来分析物品之间的相似度:如果喜欢物品A的用户大多数也喜欢物品B,那么认为物品A与物品B具有一定的相似度。这就很容易为推荐结果做出合理的解释。例如,如果你购买过《数据挖掘导论》,会向你推荐《机器学习》。

2.相似度度量

如何度量物品间相似度,常用的相似度度量有同现相似度、欧几里得距离、皮尔逊相关系数、余弦相似度、jaccard距离等,具体如下所示。

2.1 同现相似度

同现相似度计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ALS

公式中分母是喜欢物品x的用户数,而分子则是同时对物品x和物品y感兴趣的用户数。因此,上述公式可用理解为对物品x感兴趣的用户有多大概率也对y感兴趣 (和关联规则类似)

但上述的公式存在一个问题,如果物品y是热门物品,有很多人都喜欢,则会导致W(x, y)很大,接近于1。因此会造成任何物品都和热门物品交有很大的相似度。

2.2 改进的同现相似度

针对热门物品对同现相似度影响,引入惩罚了物品y的权重,因此减轻了热门物品和很多物品相似的可能性。改进的计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ALS_02

2.3 欧几里得距离

在数学中,欧几里得距离或欧几里得度量是欧几里得空间中两点间“普通”(即直线)距离。使用这个距离,欧氏空间成为度量空间。相关联的范数称为欧几里得范数。较早的文献称之为毕达哥拉斯度量。计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ALS_03

2.4 皮尔逊相关系数

皮尔逊相关系数,即概率论中的相关系数,取值范围[-1,+1]。当大于零时,两个变量正相关,当小于零时表示两个向量负相关。计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ItemCF_04

2.5 余弦相似度

利用多维空间两点与所设定的点形成夹角的余弦值范围为[-1,1],值越大,说明夹角越大,两点相距就越远,相似度就越小。计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_java 基于物品的协同过滤算法实现_05

该公式只考虑到了用户的评分,很可能评分较高的物品会排在前面而不管物品的其它信息。

2.6 改进的余弦相似度

考虑到了两个向量相同个体个数、X向量大小、Y向量大小,改进的余弦相似度计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_个性化推荐算法_06

2.7 Jaccard距离

此相似度不考虑评价值,只考虑两个集合共同个体数量。计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_个性化推荐算法_07

3. 预测用户评分公式

通过相似度度量可以得到物品间相似度矩阵Score(i,p),则用户u对物品预测评分计算公式如下:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ItemCF_08

其中,u为用户,p为物品,ratedItems为用户评价过物品集,r为用户对物品评价分集合。

4. 代码实现

4.1 环境及依赖

java 1.8.0_172+scala 2.11.8+spark.2.3.1

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>

4.2 物品相似度度量代码

4.2.1 改进共现相似度

/**
    * 改进的共现相似度
    * 共现相似度=numRatersPairs/numRaterPair,同时对A和B感兴趣的用户数/对A感兴趣的用户数
    * 描述喜欢A(numRaters)的用户有多大概率对B(numRatersPair)感兴趣,但B是热门物品,导致共现相似度为1
    * 改进的共现相似度=numRatersPairs/sqrt(numRaters * numRatersPair),惩罚物品B权重,减轻热门物品和很多相似物品的可能性
    * @param numRatersPairs
    * @param numRaters
    * @param numRatersPair
    * @return
    */
  def cooccurrence(numRatersPairs:Long,numRaters:Long,numRatersPair:Long):Double = {
    numRatersPairs / math.sqrt(numRaters * numRatersPair)
  }

4.2.2 皮尔逊相关系数

/**
    * 皮尔逊相关系数=变量协方差/标准差
    * @param size
    * @param dotProduct
    * @param ratingSum
    * @param ratingPairSum
    * @param ratingNorm
    * @param ratingNormPair
    * @return
    */
  def correlation(size:Double,dotProduct:Double,ratingSum:Double,ratingPairSum:Double,ratingNorm:Double,ratingNormPair:Double):Double = {
    val numerator = size * dotProduct - ratingSum * ratingPairSum
    val denomiator = math.sqrt(size * ratingNorm - ratingSum * ratingSum) * math.sqrt(size * ratingNormPair - ratingPairSum * ratingPairSum)
    if(denomiator==0) 0 else numerator/denomiator
  }

4.2.3 改进皮尔逊相关系数

/**
    * 正则化相关系数
    * @param size
    * @param dotProduct
    * @param ratingSum
    * @param ratingPairSum
    * @param ratingNorm
    * @param ratingNormPair
    * @param virtualCount
    * @param priorCorrelation
    * @return
    */
  def regularCorrelation(size:Double,dotProduct:Double,ratingSum:Double,ratingPairSum:Double,ratingNorm:Double,ratingNormPair:Double,virtualCount:Double,priorCorrelation:Double):Double = {
    val unregularizedCorrelation = correlation(size, dotProduct, ratingSum, ratingPairSum, ratingNorm, ratingNormPair)
    val w = size/(size + virtualCount)
    w * unregularizedCorrelation + (1 - w) * priorCorrelation
  }

4.2.4 余弦相似度

/**
    * 余弦相似度
    * @param dotProduct
    * @param ratingNorm
    * @param ratingNormPair
    * @return
    */
  def cosineSimilarity(dotProduct:Double, ratingNorm:Double, ratingNormPair:Double):Double = {
    dotProduct/(ratingNorm * ratingNormPair)
  }

 4.2.5 改进的余弦相似度

/**
    * 改进的余弦相似度
    * 考虑两个向量相同个体个数,A向量大小和B向量大小
    * @param dotProduct
    * @param ratingNorm
    * @param ratingNormPair
    * @param numPairs
    * @param num
    * @param numPair
    * @return
    */
  def improvedCosineSimilarity(dotProduct:Double,ratingNorm:Double,ratingNormPair:Double,numPairs:Long,num:Long,numPair:Long):Double = {
    dotProduct * numPairs / (ratingNorm * ratingNormPair * num * math.log10(10 + numPair))
  }

4.2.6 Jaccard距离

/**
    * Jaccard相似度
    * @param size
    * @param numRaters
    * @param numRatersPair
    * @return
    */
  def jaccardSimilarity(size:Double,numRaters:Double,numRatersPair:Double):Double = {
    size/(numRaters + numRatersPair - size)
  }

 4.3 预测用户评分代码

4.3.1 计算物品集相似度矩阵

依次计算改进共现相似度、皮尔逊系数、改进皮尔逊系数、余弦相似度、改进余弦相似度、Jaccard距离,并引入以上度量加权系数和度量,这里的加权系数自定义为coef = (0.1,0.1,0.1,0.2,0.3,0.1)。其中余弦相似度和改进余弦相似度越小,描述物品的相似度越大,因此加权系数和度量将余弦相似度和改进余弦相似度作取负处理。

import spark.implicits._
    val rating = spark.read.textFile(path).map(parseRating(_)).toDF()
    rating.show(10, false)
//    每个用户评分最高的top10物品
    val userRecs = rating.select($"userId", $"movieId", $"rating", functions.row_number().over(Window.partitionBy("userId").orderBy("rating")).alias("rank"))
      .filter($"rank" <= 10)
    userRecs.show(10, false)
//    获取每个物品评分用户数,item2manyUser格式如下(movieId,numRaters)
//    rating.groupBy($"movieId").pivot("rating").count().show(false)
    val item2manyUser = rating.groupBy($"movieId").count().toDF("movieId", "numRaters").coalesce(defaultParallelism)
    item2manyUser.show(10, false)
//    获取用户对物品评分及评分物品数量,ratingWithSize和ratingWithSizePair格式如下(userId,movieId,rating,timestamp,numRaters)
    val ratingsWithSize = rating.join(item2manyUser, "movieId").coalesce(defaultParallelism)
    ratingsWithSize.show(10, false)
    val ratingsWithSizePair = ratingsWithSize.toDF("movieIdPair", "userId", "ratingPair", "timestampPair", "numRatersPair")
//    获取用户对不同物品的评分的矩阵,并过滤相同item pairs
    val ratingPairs = ratingsWithSize.join(ratingsWithSizePair, "userId").where($"movieId" < $"movieIdPair")
      .selectExpr("userId", "movieId", "rating", "numRaters", "movieIdPair", "ratingPair", "numRatersPair", "rating * ratingPair as product", "pow(rating,2) as ratingPow", "pow(ratingPair,2) as ratingPairPow")
      .coalesce(defaultParallelism)
    ratingPairs.show(10, false)
//    计算item pairs的相似度统计量
    val vectorCals = ratingPairs.groupBy("movieId", "movieIdPair")
      .agg(functions.count("userId").alias("size"),
        functions.sum("product").alias("dotProduct"),
        functions.sum("rating").alias("ratingSum"),
        functions.sum("ratingPair").alias("ratingPairSum"),
        functions.sum("ratingPow").alias("ratingPowSum"),
        functions.sum("ratingPairPow").alias("ratingPairPowSum"),
        functions.first("numRaters").alias("numRaters"),
        functions.first("numRatersPair").alias("numRatersPair"))
//      .agg(Map("userId"->"count","product"->"sum","rating"->"sum","ratingPair"->"sum","ratingPow"->"sum","ratingPairPow"->"sum","numRaters"->"first","numRatersPair"->"first"))
//      .toDF("movieId","movieIdPair","size","dotProduct","ratingSum","ratingPairSum","ratingPowSum","ratingPairPowSum","numRaters","numRatersPair")
      .coalesce(defaultParallelism)
    vectorCals.show(10, false)
//    计算item pairs的相似度度量(包括:共现相似度、改进共现相似度、皮尔逊系数、改进皮尔逊系数、余弦相似度、改进余弦相似度、Jaccard距离)
    val similar = vectorCals.map(row => {
      val movieId = row.getAs[Int]("movieId")
      val movieIdPair = row.getAs[Int]("movieIdPair")
      val size = row.getAs[Long]("size")
      val dotProduct = row.getAs[Double]("dotProduct")
      val ratingSum = row.getAs[Double]("ratingSum")
      val ratingPairSum = row.getAs[Double]("ratingPairSum")
      val ratingPowSum = row.getAs[Double]("ratingPowSum")
      val ratingPairPowSum = row.getAs[Double]("ratingPairPowSum")
      val numRaters = row.getAs[Long]("numRaters")
      val numRatersPair = row.getAs[Long]("numRatersPair")

      val cooc = cooccurrence(size, numRaters, numRatersPair)
      val corr = correlation(size, dotProduct, ratingSum, ratingPairSum, ratingPowSum, ratingPairPowSum)
      val regCorr = regularCorrelation(size, dotProduct, ratingSum, ratingPairSum, ratingPowSum, ratingPairPowSum, PRIOR_COUNT, PRIOR_CORRELATION)
      val cos = cosineSimilarity(dotProduct, math.sqrt(ratingPowSum), math.sqrt(ratingPairPowSum))
      val impCos = improvedCosineSimilarity(dotProduct, math.sqrt(ratingPowSum), math.sqrt(ratingPairPowSum), size, numRaters, numRatersPair)
      val jac = jaccardSimilarity(size, numRaters, numRatersPair)
      val score = coef(0)*cooc + coef(1)*corr + coef(2)*regCorr - coef(3)*cos - coef(4)*impCos + coef(5)*jac
      (movieId, movieIdPair, cooc, corr, regCorr, cos, impCos, jac, score)
    }).toDF("movieId", "movieIdPair", "cooc", "corr", "regCorr", "cos", "impCos", "jac", "score")
    similar.show(10, false)
//    半角矩阵反转,计算所有item pairs相似度度量
    val similarities = similar.withColumnRenamed("movieId", "movieIdRe")
      .withColumnRenamed("movieIdPair", "movieId")
      .withColumnRenamed("movieIdRe","movieIdPair")
      .union(similar)
      .repartition(defaultParallelism)
    similarities.show(10, false)
    val ItemPairSim = similarities.groupBy("movieId","movieIdPair").agg(
      functions.sum("cooc").alias("coocSim"),
      functions.sum("corr").alias("corrSim"),
      functions.sum("regCorr").alias("regCorrSim"),
      functions.sum("cos").alias("cosSim"),
      functions.sum("impCos").alias("impCosSim"),
      functions.sum("jac").alias("jacSim"),
      functions.sum("score").alias("scores")
    )
    val simCols = Array("coocSim","corrSim","regCorrSim","cosSim","impCosSim","jacSim","scores")
    simCols.map(simCol =>{
      simCol match{
        case "coocSim" => println("共现相似度:")
        case "corrSim" => println("皮尔逊相关系数:")
        case "regCorrSim" => println("改进皮尔逊相关系数:")
        case "cosSim" => println("余弦相似度:")
        case "impCosSim" => println("改进的余弦相似度:")
        case "jacSim" => println("Jaccard相似度:")
        case _ => println("加权相似度:")
      }
      val itemPairsCol = if(simCol.equals("cosSim")||simCol.equals("impCosSim")){
        ItemPairSim.select( $"movieId", $"movieIdPair", functions.row_number().over(Window.partitionBy("movieId").orderBy(simCol)).alias("rank"))
          .filter($"rank" <= 10)
      }else{
        ItemPairSim.select( $"movieId", $"movieIdPair", functions.row_number().over(Window.partitionBy("movieId").orderBy(functions.desc(simCol))).alias("rank"))
          .filter($"rank" <= 10)
      }
      println(itemPairsCol.select("movieId", "movieIdPair").where($"movieId" === 15).collectAsList())
    })
    ItemPairSim.where($"movieId" === 15).show(10,false)

4.3.2 计算用户对物品预测评分

在3 预测用户评分中,选取用户对所有物品评分最高的top10作为最终的推荐结果。

//    用户评分与item pairs连接
    val userRating = rating.join(similar, "movieId")
      .selectExpr("userId", "movieId", "movieIdPair", "cooc", "cooc * rating as coocMeasure",
        "corr", "corr * rating as corrMeasure", "regCorr", "regCorr * rating as regCorrMeasure",
        "cos", "cos * rating as cosMeasure", "impCos", "impCos * rating as impCosMeasure", "jac", "jac * rating as jacMeasure", "score", "score * rating as scoreMeasure")
      .coalesce(defaultParallelism)
    userRating.show(10, false)
//    用户对所有物品评分预测
    val userScore = userRating.groupBy("userId", "movieIdPair")
      .agg(functions.sum("cooc").alias("coocSum"),
        functions.sum("coocMeasure").alias("coocMeasureSum"),
        functions.sum("corr").alias("corrSum"),
        functions.sum("corrMeasure").alias("corrMeasureSum"),
        functions.sum("regCorr").alias("regCorrSum"),
        functions.sum("regCorrMeasure").alias("regCorrMeasureSum"),
        functions.sum("cos").alias("cosSum"),
        functions.sum("cosMeasure").alias("cosMeasureSum"),
        functions.sum("impCos").alias("impCosSum"),
        functions.sum("impCosMeasure").alias("impCosMeasureSum"),
        functions.sum("jac").alias("jacSum"),
        functions.sum("jacMeasure").alias("jacMeasureSum"),
        functions.sum("score").alias("score"),
        functions.sum("scoreMeasure").alias("scoreMeasure")
      )
      .selectExpr("userId", "movieIdPair", "coocSum/coocMeasureSum as coocScore",
        "corrSum/corrMeasureSum as corrScore", "regCorrSum/regCorrMeasureSum as regCorrScore",
        "cosSum/cosMeasureSum as cosScore", "impCosSum/impCosMeasureSum as impCosScore",
        "jacSum/jacMeasureSum as jacScore","score/scoreMeasure as scores")
      .coalesce(defaultParallelism)
//    选取每个用户评分最高的10个商品
    val userRanks = userScore
      .select($"userId", $"movieIdPair", $"scores", functions.row_number().over(Window.partitionBy("userId").orderBy(functions.desc("scores"))).alias("rank"))
      .filter($"rank" <= RANKS)
    val userRecommend = userRanks.select($"userId", functions.concat_ws(":", $"movieIdPair", $"scores").alias("recommend"))
      .groupBy("userId")
      .agg(functions.collect_set("recommend"))
    userRecommend.show(10, false)

4.4 运行结果

4.4.1 计算物品相似度矩阵

+------+-------+------+----------+
 |userId|movieId|rating|timestamp |
 +------+-------+------+----------+
 |0     |2      |3.0   |1424380312|
 |0     |3      |1.0   |1424380312|
 |0     |5      |2.0   |1424380312|
 |0     |9      |4.0   |1424380312|
 |0     |11     |1.0   |1424380312|
 |0     |12     |2.0   |1424380312|
 |0     |15     |1.0   |1424380312|
 |0     |17     |1.0   |1424380312|
 |0     |19     |1.0   |1424380312|
 |0     |21     |1.0   |1424380312|
 +------+-------+------+----------+
 only showing top 10 rows+------+-------+------+----+
 |userId|movieId|rating|rank|
 +------+-------+------+----+
 |28    |1      |1.0   |1   |
 |28    |3      |1.0   |2   |
 |28    |6      |1.0   |3   |
 |28    |7      |1.0   |4   |
 |28    |14     |1.0   |5   |
 |28    |15     |1.0   |6   |
 |28    |17     |1.0   |7   |
 |28    |20     |1.0   |8   |
 |28    |27     |1.0   |9   |
 |28    |29     |1.0   |10  |
 +------+-------+------+----+
 only showing top 10 rows+-------+---------+
 |movieId|numRaters|
 +-------+---------+
 |31     |15       |
 |85     |18       |
 |65     |11       |
 |53     |12       |
 |78     |14       |
 |34     |11       |
 |81     |16       |
 |28     |12       |
 |76     |11       |
 |26     |14       |
 +-------+---------+
 only showing top 10 rows+-------+------+------+----------+---------+
 |movieId|userId|rating|timestamp |numRaters|
 +-------+------+------+----------+---------+
 |2      |0     |3.0   |1424380312|19       |
 |3      |0     |1.0   |1424380312|13       |
 |5      |0     |2.0   |1424380312|13       |
 |9      |0     |4.0   |1424380312|16       |
 |11     |0     |1.0   |1424380312|12       |
 |12     |0     |2.0   |1424380312|17       |
 |15     |0     |1.0   |1424380312|19       |
 |17     |0     |1.0   |1424380312|13       |
 |19     |0     |1.0   |1424380312|17       |
 |21     |0     |1.0   |1424380312|17       |
 +-------+------+------+----------+---------+
 only showing top 10 rows+------+-------+------+---------+-----------+----------+-------------+-------+---------+-------------+
 |userId|movieId|rating|numRaters|movieIdPair|ratingPair|numRatersPair|product|ratingPow|ratingPairPow|
 +------+-------+------+---------+-----------+----------+-------------+-------+---------+-------------+
 |28    |0      |3.0   |16       |1          |1.0       |13           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |2          |4.0       |19           |12.0   |9.0      |16.0         |
 |28    |0      |3.0   |16       |3          |1.0       |13           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |6          |1.0       |20           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |7          |1.0       |16           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |12         |5.0       |17           |15.0   |9.0      |25.0         |
 |28    |0      |3.0   |16       |13         |2.0       |16           |6.0    |9.0      |4.0          |
 |28    |0      |3.0   |16       |14         |1.0       |18           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |15         |1.0       |19           |3.0    |9.0      |1.0          |
 |28    |0      |3.0   |16       |17         |1.0       |13           |3.0    |9.0      |1.0          |
 +------+-------+------+---------+-----------+----------+-------------+-------+---------+-------------+
 only showing top 10 rows+-------+-----------+----+----------+---------+-------------+------------+----------------+---------+-------------+
 |movieId|movieIdPair|size|dotProduct|ratingSum|ratingPairSum|ratingPowSum|ratingPairPowSum|numRaters|numRatersPair|
 +-------+-----------+----+----------+---------+-------------+------------+----------------+---------+-------------+
 |3      |57         |4   |8.0       |4.0      |8.0          |4.0         |20.0            |13       |12           |
 |3      |89         |3   |8.0       |3.0      |8.0          |3.0         |24.0            |13       |11           |
 |27     |65         |5   |33.0      |11.0     |11.0         |37.0        |35.0            |15       |11           |
 |36     |83         |8   |18.0      |14.0     |12.0         |30.0        |32.0            |18       |14           |
 |52     |58         |8   |32.0      |15.0     |12.0         |43.0        |26.0            |14       |15           |
 |58     |81         |10  |23.0      |15.0     |18.0         |33.0        |48.0            |15       |16           |
 |63     |81         |9   |24.0      |15.0     |16.0         |31.0        |44.0            |16       |16           |
 |7      |55         |8   |14.0      |13.0     |9.0          |35.0        |11.0            |16       |19           |
 |18     |68         |10  |56.0      |26.0     |19.0         |82.0        |51.0            |15       |19           |
 |18     |95         |8   |32.0      |19.0     |15.0         |57.0        |35.0            |15       |17           |
 +-------+-----------+----+----------+---------+-------------+------------+----------------+---------+-------------+
 only showing top 10 rows+-------+-----------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 |movieId|movieIdPair|cooc               |corr                |regCorr             |cos               |impCos             |jac                |score               |
 +-------+-----------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 |3      |57         |0.32025630761017426|0.0                 |0.0                 |0.8944271909999159|0.20500872816969473|0.19047619047619047|-0.1702671877946361 |
 |3      |89         |0.2508726030021272 |0.0                 |0.0                 |0.9428090415820635|0.1645501000890719 |0.14285714285714285|-0.18426814947149298|
 |27     |65         |0.3892494720807615 |0.7484551991837488  |0.24948506639458293 |0.9170205237216019|0.23118215648843166|0.23809523809523808|-0.06642073030589295|
 |36     |83         |0.5039526306789696 |-0.3418817293789138 |-0.15194743527951723|0.5809475019311126|0.18707200893898498|0.3333333333333333 |-0.10463208979919748|
 |52     |58         |0.5520524474738834 |0.8708635721768008  |0.38705047652302255 |0.9570377672873267|0.3912032853853935 |0.38095238095238093|-0.05158141326523652|
 |58     |81         |0.6454972243679028 |-0.3125381539589969 |-0.15626907697949846|0.577896743774047 |0.2722768569470682 |0.47619047619047616|-0.08435531125789389|
 |63     |81         |0.5625             |-0.27602622373694163|-0.1307492638753934 |0.6498364332886588|0.25833206982242696|0.391304347826087  |-0.11363358680047596|
 |7      |55         |0.4588314677411235 |-0.17937400083354382|-0.07972177814824169|0.7135060680126758|0.2439507128147666 |0.2962962962962963 |-0.1366535993117721 |
 |18     |68         |0.5923488777590923 |0.45057755628547236 |0.22528877814273618 |0.8659563730239938|0.3947654807460637 |0.4166666666666667 |-0.08146606427655445|
 |18     |95         |0.5009794328681196 |-0.40119438904232335|-0.1783086173521437 |0.7164378605434321|0.26694834804228634|0.3333333333333333 |-0.1645577672073404 |
 +-------+-----------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 only showing top 10 rows+-----------+-------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 |movieIdPair|movieId|cooc               |corr                |regCorr             |cos               |impCos             |jac                |score               |
 +-----------+-------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 |25         |51     |0.5976143046671968 |-0.45014069095231984|-0.22507034547615992|0.6819309069874763|0.32975864603850563|0.4166666666666667 |-0.1597401150518419 |
 |47         |87     |0.3450327796711771 |0.0                 |0.0                 |0.9233805168766386|0.2359033677994821 |0.20833333333333334|-0.17927716908138797|
 |7          |96     |0.5809475019311126 |-0.48176685558046484|-0.22820535264337807|0.5507685172519937|0.22161701434423176|0.4090909090909091 |-0.1077230965647595 |
 |56         |76     |0.5118906968889915 |-0.6454972243679029 |-0.26579297473972474|0.7105597124064275|0.22128206127090347|0.3333333333333333 |-0.1817698444177535 |
 |32         |43     |0.4472135954999579 |0.090075469822209   |0.03377830118332838 |0.8549090976340066|0.30577460131716266|0.2857142857142857 |-0.14846460612854348|
 |10         |54     |0.3757345746510897 |0.0                 |0.0                 |0.9525793444156805|0.2662018889308347 |0.23076923076923078|-0.18664913194343136|
 |12         |75     |0.3500700210070024 |-0.5103103630798287 |-0.1701034543599429 |0.5011148285857957|0.10979158531474496|0.20833333333333334|-0.12452815428819287|
 |9          |66     |0.375              |0.0                 |0.0                 |0.8725028717782317|0.23123303162285763|0.23076923076923078|-0.1602166376886575 |
 |32         |34     |0.26111648393354675|0.0                 |0.0                 |0.8164965809277261|0.15437994744510897|0.15               |-0.15350165202572327|
 |86         |92     |0.44095855184409843|0.16666666666666669 |0.06862745098039216 |0.903696114115064 |0.2546257899447296 |0.28               |-0.13350169285731595|
 +-----------+-------+-------------------+--------------------+--------------------+------------------+-------------------+-------------------+--------------------+
 only showing top 10 rows共现相似度:
 [[15,5], [15,7], [15,4], [15,6], [15,14], [15,9], [15,12], [15,10], [15,2], [15,1]]
 皮尔逊相关系数:
 [[15,8], [15,12], [15,7], [15,4], [15,11], [15,3], [15,2], [15,1], [15,0], [15,9]]
 改进皮尔逊相关系数:
 [[15,12], [15,7], [15,4], [15,8], [15,11], [15,3], [15,2], [15,1], [15,0], [15,9]]
 余弦相似度:
 [[15,9], [15,10], [15,1], [15,13], [15,6], [15,2], [15,0], [15,11], [15,7], [15,12]]
 改进的余弦相似度:
 [[15,0], [15,13], [15,2], [15,11], [15,10], [15,9], [15,1], [15,6], [15,12], [15,3]]
 Jaccard相似度:
 [[15,5], [15,7], [15,6], [15,4], [15,14], [15,9], [15,12], [15,10], [15,2], [15,1]]
 加权相似度:
 [[15,7], [15,12], [15,4], [15,9], [15,2], [15,1], [15,8], [15,6], [15,14], [15,11]]
 +-------+-----------+------------------+--------------------+--------------------+------------------+------------------+-------------------+--------------------+
 |movieId|movieIdPair|coocSim           |corrSim             |regCorrSim          |cosSim            |impCosSim         |jacSim             |scores              |
 +-------+-----------+------------------+--------------------+--------------------+------------------+------------------+-------------------+--------------------+
 |15     |14         |1.2977713690461004|-0.4633323746250466 |-0.25272674979547993|1.6979054399120357|0.7740279743049593|0.96               |-0.32161825581133757|
 |15     |12         |1.224112744964246 |0.663663648395968   |0.34763333963598325 |1.6799278063066676|0.7433079855995841|0.88               |-0.159436983641589  |
 |15     |1          |1.1453125733564   |-0.25               |-0.11842105263157894|1.5320646925708532|0.725288309546159 |0.782608695652174  |-0.28978854017510147|
 |15     |5          |1.399826478546711 |-0.43852900965351466|-0.22970567172326958|1.7172593257387583|0.993618416740892 |1.0476190476190477 |-0.35885440092921705|
 |15     |6          |1.3337718577107005|-0.5564866749122019 |-0.3145359466895054 |1.6412198797244364|0.7294819353921144|1.0                |-0.3008136329516223 |
 |15     |2          |1.1578947368421053|-0.1437770309379179 |-0.07531177811033796|1.6470588235294117|0.6520525690591884|0.8148148148148148 |-0.26818397968129093|
 |15     |8          |0.8671099695241199|0.8164965809277261  |0.2721655269759087  |1.6853174301284732|0.8231672678073898|0.47619047619047616|-0.2931983633870409 |
 |15     |11         |0.9271726499455306|0.0                 |0.0                 |1.6630436812405998|0.6633685326776863|0.5833333333333334 |-0.32223536439020606|
 |15     |10         |1.1846977555181846|-0.6965260331469925 |-0.34826301657349623|1.5212174611483278|0.6934808277609235|0.8333333333333334 |-0.33163020331150633|
 |15     |7          |1.3764944032233704|0.6112274566280462  |0.3333967945243888  |1.6711454971746993|0.8570574663543986|1.0434782608695652 |-0.15053882172976585|
 +-------+-----------+------------------+--------------------+--------------------+------------------+------------------+-------------------+--------------------+
 only showing top 10 rows
  4.4.2 计算用户对物品预测评分
+------+-------+-----------+-------------------+-------------------+----+-----------+-------+--------------+------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
 |userId|movieId|movieIdPair|cooc               |coocMeasure        |corr|corrMeasure|regCorr|regCorrMeasure|cos               |cosMeasure        |impCos             |impCosMeasure      |jac                |jacMeasure         |score              |scoreMeasure       |
 +------+-------+-----------+-------------------+-------------------+----+-----------+-------+--------------+------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
 |29    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |28    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |26    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |22    |3      |57         |0.32025630761017426|0.6405126152203485 |0.0 |0.0        |0.0    |0.0           |0.8944271909999159|1.7888543819998317|0.20500872816969473|0.41001745633938946|0.19047619047619047|0.38095238095238093|-0.1702671877946361|-0.3405343755892722|
 |21    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |17    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |14    |3      |57         |0.32025630761017426|0.9607689228305227 |0.0 |0.0        |0.0    |0.0           |0.8944271909999159|2.6832815729997477|0.20500872816969473|0.6150261845090842 |0.19047619047619047|0.5714285714285714 |-0.1702671877946361|-0.5108015633839083|
 |13    |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |9     |3      |57         |0.32025630761017426|0.32025630761017426|0.0 |0.0        |0.0    |0.0           |0.8944271909999159|0.8944271909999159|0.20500872816969473|0.20500872816969473|0.19047619047619047|0.19047619047619047|-0.1702671877946361|-0.1702671877946361|
 |8     |3      |57         |0.32025630761017426|0.6405126152203485 |0.0 |0.0        |0.0    |0.0           |0.8944271909999159|1.7888543819998317|0.20500872816969473|0.41001745633938946|0.19047619047619047|0.38095238095238093|-0.1702671877946361|-0.3405343755892722|
 +------+-------+-----------+-------------------+-------------------+----+-----------+-------+--------------+------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
 only showing top 10 rows+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |userId|collect_set(recommend)                                                                                                                                                                                                             |
 +------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |28    |[57:0.6823095186850562, 81:0.7074190948255319, 82:0.672283640715388, 12:0.636297470883653, 92:0.647256075399604, 38:0.6347904579454706, 49:0.6254644038108939, 89:0.6941355224210544, 80:0.6154978795267612, 40:0.6156027460005857]|
 |26    |[3:1.0, 73:0.5968684282344038, 6:0.6647065655763444, 18:0.5799619983007535, 1:1.0, 2:1.0, 21:0.5799845694216877, 5:0.8498976850090274, 7:0.5942352082433142, 4:1.0]                                                                |
 |27    |[9:1.0, 18:1.0, 10:1.0, 3:1.0, 13:1.0, 11:1.0, 8:1.0, 6:1.0, 2:1.0, 4:1.0]                                                                                                                                                         |
 |12    |[7:1.0, 14:0.8059733660850584, 5:1.0, 3:1.0, 10:0.8293788719613415, 15:0.8349960681296311, 16:0.7700888581918046, 6:1.0, 4:1.0, 13:0.8092037847999061]                                                                             |
 |22    |[21:0.7304053132054968, 14:0.7261359695697873, 22:0.7789954217249625, 3:1.0, 15:0.7439260230313952, 24:0.7289206702134335, 18:0.7695442506930248, 2:1.0, 1:1.0, 17:0.7492600967934236]                                             |
 |1     |[56:0.7346282753379684, 21:0.761510986584304, 68:0.7437219860490558, 20:0.7619575246355602, 77:0.7191257119486413, 4:0.7228603942815985, 19:0.7429016915005017, 28:0.7400829561035526, 86:0.717171677819262, 62:0.7218697862031331]|
 |13    |[14:0.8650749933539622, 7:0.8879325873718314, 3:1.0, 8:0.904542655291971, 5:0.903713357971985, 12:0.893690744991781, 2:1.0, 1:1.0, 11:0.9145180452679533, 4:1.0]                                                                   |
 |6     |[40:0.8284063212375322, 43:0.8278553890046102, 12:0.8405217298206186, 25:0.8331053495744748, 14:0.9206849112675594, 42:0.856004729785749, 61:0.8413345199040396, 1:1.0, 2:1.0, 58:0.8419910885809283]                              |
 |16    |[28:0.7777501297286809, 5:1.0, 3:1.0, 22:0.7936015217150886, 51:0.767085271204479, 45:0.7314911580677125, 18:0.767613647131584, 21:0.8171576034323195, 24:0.8137831859252895, 4:1.0]                                               |
 |3     |[22:0.7082578422647425, 7:1.0, 5:1.0, 3:1.0, 8:0.7233338418664426, 6:1.0, 1:1.0, 2:1.0, 4:1.0, 18:0.7594820850748755]                                                                                                              |
 +------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 only showing top 10 rows

二、基于ALS的协同过滤算法

1.基本思想

通过观察所有用户给产品的打分,来推断每个用户的喜好并向用户推荐合适产品。不像基于用户或者物品的协同过滤算法,通过计算相似度来对评分预测和推荐,而是通过矩阵分解方法来进行预测。

2.交替最小二乘求解ALS

用户评分矩阵的每一行代表一个用户

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ALS_09

,每一列代表一个物品

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_个性化推荐算法_10

,矩阵中的每一个元素代表用户对物品的评分。ALS的核心假设为:打分矩阵A是近似低秩的,即一个m * n的打分矩阵A可以用两个小矩阵

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ItemCF_11


java 基于物品的协同过滤算法实现 基于item的协同过滤算法_协同过滤_12

乘积来近似:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_协同过滤_13

打分矩阵

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_ItemCF_14

就可以由用户喜好特征矩阵

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_个性化推荐算法_15

和产品特征矩阵

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_个性化推荐算法_16

表示。

为了找到使低秩矩阵UV尽可能逼近A,需要最小化平方误差损失函数:

java 基于物品的协同过滤算法实现 基于item的协同过滤算法_协同过滤_17

损失函数一般需要加入正则化项来避免过拟合问题,使用L2正则化,则改造为:

把协同过滤问题转化为优化问题,求解采用交替最小二乘(ALS)。

3. 代码实现

环境和依赖同上。

3.1 建立ALS协同过滤模型及预测

val als = new ALS()
      .setMaxIter(5)
      .setRegParam(0.01)
      .setUserCol("userId")
      .setItemCol("movieId")
      .setRatingCol("rating")
    val model = als.fit(training)


    model.setColdStartStrategy("drop")
    val predictions = model.transform(test)

3.2 模型评估

val evaluator = new RegressionEvaluator()
      .setMetricName("rmse")
      .setLabelCol("rating")
      .setPredictionCol("prediction")
    val rmse = evaluator.evaluate(predictions)
    println(s"Root-mean-square error = $rmse")

3.3 推荐列表

//    为每个用户生成前10个电影推荐
    val userRecs = model.recommendForAllUsers(10)
    userRecs.show(10, false)
//    为每部电影生成前10个用户推荐
    val movieRecs = model.recommendForAllItems(10)
    movieRecs.show(10,false)

//    为指定的一组用户生成前10个电影推荐
    val users = ratings.select(als.getUserCol).distinct().limit(3)
    val userSubsetRecs = model.recommendForUserSubset(users, 10)
    userSubsetRecs.show(10, false)

//    为指定的一组电影生成前10个用户推荐
    val movies = ratings.select(als.getItemCol).distinct().limit(3)
    val movieSubSetRecs = model.recommendForItemSubset(movies, 10)
    movieSubSetRecs.show(5, false)
  }
  case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long)
  def parseRating(str: String): Rating = {
    val fields = str.split("::")
    assert(fields.size == 4)
    Rating(fields(0).toInt, fields(1).toInt, fields(2).toFloat, fields(3).toLong)

4 运行结果

Root-mean-square error = 1.927040678568387
 +------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |userId|recommendations                                                                                                                                                         |
 +------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |28    |[[92, 5.190725], [81, 4.86317], [4, 4.7198944], [69, 4.33059], [29, 4.2829266], [89, 4.2616525], [76, 4.1987777], [96, 4.184493], [7, 4.1226907], [2, 4.0815644]]       |
 |26    |[[51, 6.058535], [30, 5.7161875], [94, 5.0318503], [88, 4.9334908], [7, 4.8731565], [24, 4.6401963], [55, 4.472246], [53, 4.236663], [77, 4.1913238], [68, 4.061715]]   |
 |27    |[[38, 4.2577467], [46, 4.063789], [30, 3.945347], [18, 3.7809763], [23, 3.7044308], [17, 3.3986986], [69, 3.2548237], [27, 3.2114499], [1, 3.1994212], [83, 3.1642542]] |
 |12    |[[25, 5.5946026], [46, 5.5124335], [17, 5.11423], [35, 5.0823307], [64, 5.0337462], [27, 4.959539], [43, 4.428263], [1, 4.240394], [94, 3.9515357], [31, 3.9412215]]    |
 |22    |[[53, 5.5407586], [75, 5.1068535], [46, 5.0826797], [22, 5.047842], [74, 4.9705014], [52, 4.8736644], [88, 4.8529797], [87, 4.850308], [30, 4.643672], [51, 4.5106797]] |
 |1     |[[62, 3.5494595], [10, 3.4318256], [68, 3.4186132], [49, 3.3213418], [92, 3.2935987], [85, 3.0727763], [77, 2.9407728], [9, 2.8703325], [39, 2.7454283], [55, 2.435999]]|
 |13    |[[32, 4.720694], [69, 4.131916], [93, 3.9207232], [96, 3.6651685], [62, 3.439321], [4, 3.4145846], [74, 3.3329194], [53, 3.194853], [30, 2.9827442], [92, 2.9585018]]   |
 |6     |[[25, 4.8636093], [58, 3.809049], [62, 3.5082598], [43, 3.4347136], [40, 3.1993797], [37, 3.172602], [92, 3.0829635], [64, 3.0454423], [52, 2.998352], [95, 2.9680624]] |
 |16    |[[90, 5.2693963], [85, 4.881312], [54, 4.7646246], [51, 4.623986], [1, 4.4009867], [33, 3.4036496], [68, 3.381634], [94, 3.1365643], [47, 3.060517], [10, 2.9913652]]   |
 |3     |[[51, 5.0008397], [88, 3.9850216], [24, 3.3814678], [57, 3.186227], [97, 3.106503], [94, 3.0213935], [74, 3.0056891], [76, 2.9751751], [29, 2.9741306], [87, 2.9737644]]|
 +------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 only showing top 10 rows+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |movieId|recommendations                                                                                                                                                        |
 +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |31     |[[12, 3.9412215], [8, 3.2569442], [7, 2.8316283], [6, 2.5838141], [15, 2.1348748], [22, 2.0737545], [21, 2.0215204], [25, 1.8854598], [29, 1.4713752], [14, 1.3172331]]|
 |85     |[[16, 4.881312], [17, 4.2706757], [14, 4.2134066], [7, 3.6982865], [1, 3.0727763], [15, 2.7946656], [19, 2.5757656], [6, 2.2734714], [20, 2.2645016], [3, 2.1871538]]  |
 |65     |[[23, 4.6813827], [20, 4.5017395], [25, 3.6871593], [14, 3.652615], [22, 3.0776486], [7, 2.9361606], [6, 2.4191651], [5, 2.20444], [0, 2.1689785], [3, 2.026729]]      |
 |53     |[[22, 5.5407586], [21, 4.9554963], [8, 4.9435635], [24, 4.7219305], [26, 4.236663], [13, 3.194853], [5, 2.8868222], [20, 2.8620806], [27, 2.6194685], [28, 2.595715]]  |
 |78     |[[5, 1.3975992], [23, 1.3725746], [25, 1.2985932], [18, 1.2398205], [6, 1.1964377], [29, 1.1187494], [7, 1.1098578], [2, 1.0951111], [24, 1.0589423], [13, 1.04663]]   |
 |34     |[[14, 4.7571654], [23, 4.189182], [2, 4.0316715], [18, 3.776127], [28, 3.2931669], [25, 2.9498177], [20, 2.885182], [3, 2.7920961], [13, 2.5485854], [0, 2.4903808]]   |
 |81     |[[28, 4.86317], [11, 4.0060267], [23, 3.2592351], [18, 3.1237376], [14, 2.7114706], [13, 2.612024], [10, 2.527891], [2, 2.2649016], [9, 2.2159412], [24, 2.1401815]]   |
 |28     |[[12, 2.1206188], [7, 2.0479355], [6, 1.9925008], [15, 1.5918586], [25, 1.5583891], [8, 1.5504173], [14, 1.2316033], [0, 1.2297935], [29, 1.2055482], [5, 1.1394241]]  |
 |76     |[[28, 4.1987777], [14, 3.4017448], [10, 3.2845893], [3, 2.9751751], [0, 2.9261422], [12, 2.8340385], [18, 2.7923949], [7, 2.764282], [6, 2.289724], [16, 2.189875]]    |
 |26     |[[12, 3.3199286], [11, 3.3151531], [15, 2.6134777], [29, 2.49589], [0, 2.153885], [25, 2.1172814], [27, 1.9949272], [18, 1.3374854], [20, 1.3024286], [7, 1.1732591]]  |
 +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 only showing top 10 rows+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |userId|recommendations                                                                                                                                                        |
 +------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |28    |[[92, 5.190725], [81, 4.86317], [4, 4.7198944], [69, 4.33059], [29, 4.2829266], [89, 4.2616525], [76, 4.1987777], [96, 4.184493], [7, 4.1226907], [2, 4.0815644]]      |
 |26    |[[51, 6.058535], [30, 5.7161875], [94, 5.0318503], [88, 4.9334908], [7, 4.8731565], [24, 4.6401963], [55, 4.472246], [53, 4.236663], [77, 4.1913238], [68, 4.061715]]  |
 |27    |[[38, 4.2577467], [46, 4.063789], [30, 3.945347], [18, 3.7809763], [23, 3.7044308], [17, 3.3986986], [69, 3.2548237], [27, 3.2114499], [1, 3.1994212], [83, 3.1642542]]|
 +------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |movieId|recommendations                                                                                                                                                        |
 +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 |31     |[[12, 3.9412215], [8, 3.2569442], [7, 2.8316283], [6, 2.5838141], [15, 2.1348748], [22, 2.0737545], [21, 2.0215204], [25, 1.8854598], [29, 1.4713752], [14, 1.3172331]]|
 |85     |[[16, 4.881312], [17, 4.2706757], [14, 4.2134066], [7, 3.6982865], [1, 3.0727763], [15, 2.7946656], [19, 2.5757656], [6, 2.2734714], [20, 2.2645016], [3, 2.1871538]]  |
 |65     |[[23, 4.6813827], [20, 4.5017395], [25, 3.6871593], [14, 3.652615], [22, 3.0776486], [7, 2.9361606], [6, 2.4191651], [5, 2.20444], [0, 2.1689785], [3, 2.026729]]      |
 +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

参考文献

https://glassywing.github.io/2018/04/10/spark-itemcf/