协同过滤属于机器学习吗协同过滤算法难学吗

转载

mob6454cc714ea1 2024-01-16 15:56:30

协同过滤属于机器学习吗协同过滤算法难学吗_协同过滤属于机器学习吗

基本介绍：

协同过滤算法（Collaborative Filtering），数据来源一般为用户的行为数据，不包含用户与物品本身特征信息，从物体相似度和用户相似度出发，分为Item-CF和User-Item。为了处理稀疏共现矩阵、增强模型泛化能力，衍生出矩阵分解模型。

（2）矩阵分解

对应的针对前述稀疏问题采用矩阵分解技术（Matrix Factorization， MF），使用更稠密的隐向量表征用户和物品，用户和物品的隐向量是通过分解协同过滤生成的共现矩阵得到的，分解方法一般采用梯度下降（Gradient Descent）、特征值分解（Eigen Decomposition）仅能作用于方阵、奇异值分解（Singular Value Decomposition）要求初始矩阵为稠密，且维度过高造成矩阵分解复杂度高。

$\begin{bmatrix} & 4.5 & 2.0 & \\ 4.0 & & 3.5 & \\ & 5.0 & & 2.0 \\ & 3.5 & 4.0 & 1.0 \end{bmatrix} = \begin{bmatrix} 1.2 & 0.8 \\ 1.4 & 0.9 \\ 1.5 & 1.0 \\ 1.2 & 0.8 \end{bmatrix} \times \begin{bmatrix} 1.5 & 1.2 & 1.0 & 0.8 \\ 1.7 & 0.6 & 1.1 & 0.4 \end{bmatrix}$

协同过滤属于机器学习吗协同过滤算法难学吗_数据_03

代码复现：

import numpy as np
import pandas as pd
from tqdm import tqdm

class LFM:
    def __init__(self, data_path, K):
        """初始化函数

        Args:
            data_path (_type_): 数据路径
            K (_type_): 隐向量维度
        """
        self.user_item = pd.read_csv(data_path, index_col=0).fillna(0) # 空值全用0填充，计算Loss的时候仅对非空评分

        self.R = np.array(self.user_item)
        self.K = K

        # 随机生成P Q 初始值
        M, N = self.R.shape
        self.P = np.random.rand(M, K)
        self.Q = np.random.rand(K, N)

    def train(self, max_iter, alpha, lamda):
        """ SGD训练阶段

        Args:
            max_iter (_type_): 最大迭代轮数
            alpha (_type_): 学习率
            lamda (_type_): 正则化参数
        """

        M, N = self.R.shape
        for _ in tqdm(range(max_iter)):
            # 对所有的用户u、物品i遍历，对应的特征向量Pu, Qi梯度下降
            for u in range(M):
                for i in range(N):
                    if self.R[u][i] > 0:
                        eui = np.dot(self.P[u, :], self.Q[:, i]) - self.R[u][i]
                        # u, i下 同步更新
                        for k in range(self.K):
                            self.P[u][k] = self.P[u][k] - alpha*(2*eui*self.Q[k][i] + 2*lamda*self.P[u][k])
                            self.Q[k][i] = self.Q[k][i] - alpha*(2*eui*self.P[u][k] + 2*lamda*self.Q[k][i])
            
            loss = self.__loss(lamda)
            if loss < 0.001:
                break

    def predict(self, user_name, item_name):
        # name 转 index
        user = list(lfm_model.user_item.index).index(user_name)
        item = list(lfm_model.user_item.columns).index(item_name)

        return np.dot(self.P[user, :], self.Q[:, item])


    def __loss(self, lamda):
        loss = 0
        M, N = self.R.shape
        for u in range(M):
            for i in range(N):
                # loss仅在存在的评分上计算
                if self.R[u][i] > 0:
                    loss += (np.dot(self.P[u, :], self.Q[:, i]) - self.R[u][i]) ** 2
                    # 计算正则化项损失
                    for k in range(self.K):
                        loss += lamda * (self.P[u][k] ** 2 + self.Q[k][i] ** 2)
        # print(loss) #
        return loss

lfm_model = LFM("data.csv", K=3)
lfm_model.train(50000, alpha=0.0001, lamda=0.0004)
print(lfm_model.R)
print(lfm_model.P.dot(lfm_model.Q))
print(lfm_model.P)
print(lfm_model.Q)
print(lfm_model.predict(1, 'E'))