一、背景介绍
某通信公司是通信界的巨头,其用户流失率若降低5%,那么公司利润将提升25%-85%。如今随着市场饱和度上升,高居不下的获客成本让公司遭遇了“天花板”,甚至陷入获客难的窘境。增加用户黏性和延长用户生命周期成了该通信亟待解决的问题。
数据来源:https://www.kaggle.com/blastchar/telco-customer-churn
二、分析目的
1、分析流失用户特征,生成易流失用户标签;
2、预测用户留存率随时间的变化,并提出合理化召回建议。
三、分析思路
分析工具:Tableau、Mysql、python、excel
四、可视化分析用户流失
1、用户属性特征
用户的基本特征有:性别(Gender)、年龄(Senior,1:年长,2:年轻)、有无伴侣(Partner)、有无家属(Dependents),各特征用户的流失率如下图所示:
从图中看出,年长用户、有伴侣和有家属的用户流失率明显较高,用户性别对流失率的影响不大。
2、用户服务属性
用户服务属性有:电话服务(PhoneService)、多条线路(MultipleLines)、网络服务(InternetService)、网络安全服务(OnlineSecurity),各特征用户的流失率如下图所示:
从图中可以看出,网络服务为Fiber optic、没有网络安全服务的客户流失率最高,其次是网络服务为DSL、有网络安全服务的客户,没有网络服务和网络安全服务的用户流失率最低。
3、用户交易属性
用户交易属性有:合同期限(Contract)、付款方式(PaymentMethod)、每月付费金额(MonthlyCharges)、总付费金额(TotalCharges),各特征用户的流失率如下图所示:
从上图看出,合同期限为Month-to-month、付款方式为Electronic check、每月消费金额为70至100元、总消费300元以内的客户流失率最高。
4、小结
以下特征的用户最易流失:
1)年长用户、有伴侣、有家属;
2)网络服务为Fiber optic、没有网络安全服务;
3)同期限为Month-to-month、付款方式为Electronic check、每月消费金额为70至100元、总消费300元以内。
五、生成易流失等级标签
1、量化流失风险系数
各属性对用户流失的影响越大,则流失风险系数越高,具体划分如下:
用Mysql取出未流失客户,并计算风险系数:
SELECT customerID, IF(SeniorCitizen=1,2,0) as senior, IF(Partner='Yes',2,0) as partner, IF(Dependents='Yes',2,0) as dependents,
CASE
WHEN InternetService='Fiber optic' THEN
2
WHEN InternetService='DSL' THEN
1
ELSE
0
END as internetservice,
CASE
WHEN OnlineSecurity='No' THEN
2
WHEN OnlineSecurity='Yes' THEN
1
ELSE
0
END as onlinesecurity,
CASE
WHEN Contract='Month-to-month' THEN
2
WHEN Contract='One year' THEN
1
ELSE
0
END as contract,
CASE
WHEN PaymentMethod='Electronic check' THEN
2
ELSE
0
END as paymentMethod,
IF(MonthlyCharges>=70 and MonthlyCharges <=100,1,0) as monthlycharges,
IF(TotalCharges<300,1,0) as totalcharges
from ha.wa_fn;
查询结果如下:
2、汇总风险系数,求出最终用户流失风险等级
将查询结果导入Excel中,求出最终的流失分析等级(churn_level):
流失风险等级分布如下:
接下来,运营部同事就可以根据流失风险等级,分层运营客户。
3、添加高流失风险标签
比如,风险等级大于9的定义为高流失风险客户:
六、基于生存分析预测用户流失
1、导入模块
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
# 生存分析模块
from lifelines import NelsonAalenFitter, CoxPHFitter, KaplanMeierFitter
from lifelines.statistics import logrank_test
# cox
from lifelines import CoxPHFitter
from sklearn.metrics import brier_score_loss
from sklearn.calibration import calibration_curve
# matplotlib与pandas初始设置
plt.rcParams['font.sans-serif'] = ['SimHei'] #设置中文字体为黑体
plt.rcParams['axes.unicode_minus'] = False #正常显示负号
pd.set_option('display.max_columns', 30)
plt.rcParams.update({"font.family":"SimHei","font.size":14})
plt.style.use("tableau-colorblind10")
pd.set_option('display.float_format',lambda x : '%.2f' % x)#pandas禁用科学计数法
%matplotlib inline
#忽略警告
import warnings
warnings.filterwarnings('ignore')
data = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
data_backup = data.copy()
data.head()
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
2、数据预处理
# 缺失值
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB
# 由于TotalCharges列存在缺失值,所以强制转换成数字(info中并未显示缺失值,但如果正常转换就会报错)
data['TotalCharges']=pd.to_numeric(data['TotalCharges'],errors='coerce')
data['TotalCharges'].dtype
dtype('float64')
# 的确是存在缺失值
data.TotalCharges.isnull().sum()
11
# 删除缺失值
data.dropna(subset=['TotalCharges'],inplace=True)
# 重复值
data.duplicated('customerID').sum()
0
# 异常值
data.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
SeniorCitizen | 7032.00 | 0.16 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
tenure | 7032.00 | 32.42 | 24.55 | 1.00 | 9.00 | 29.00 | 55.00 | 72.00 |
MonthlyCharges | 7032.00 | 64.80 | 30.09 | 18.25 | 35.59 | 70.35 | 89.86 | 118.75 |
TotalCharges | 7032.00 | 2283.30 | 2266.77 | 18.80 | 401.45 | 1397.47 | 3794.74 | 8684.80 |
data.describe(include='object').T
count | unique | top | freq | |
customerID | 7032 | 7032 | 7590-VHVEG | 1 |
gender | 7032 | 2 | Male | 3549 |
Partner | 7032 | 2 | No | 3639 |
Dependents | 7032 | 2 | No | 4933 |
PhoneService | 7032 | 2 | Yes | 6352 |
MultipleLines | 7032 | 3 | No | 3385 |
InternetService | 7032 | 3 | Fiber optic | 3096 |
OnlineSecurity | 7032 | 3 | No | 3497 |
OnlineBackup | 7032 | 3 | No | 3087 |
DeviceProtection | 7032 | 3 | No | 3094 |
TechSupport | 7032 | 3 | No | 3472 |
StreamingTV | 7032 | 3 | No | 2809 |
StreamingMovies | 7032 | 3 | No | 2781 |
Contract | 7032 | 3 | Month-to-month | 3875 |
PaperlessBilling | 7032 | 2 | Yes | 4168 |
PaymentMethod | 7032 | 4 | Electronic check | 2365 |
Churn | 7032 | 2 | No | 5163 |
3、分类数据转换
为了将数据代入模型,需要将分类数据转换成数字,这里用到了sklearn中的one-hoe-encode.
#分类数据转换为one-hoe-encode形式
list = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
'PaperlessBilling', 'PaymentMethod','Churn']
lec = LabelEncoder()
data.loc[:,list]=data.loc[:,list].transform(lec.fit_transform)
# churn:N0:0;Yes:1
data.head()
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
0 | 7590-VHVEG | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 29.85 | 29.85 | 0 |
1 | 5575-GNVDE | 1 | 0 | 0 | 0 | 34 | 1 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 3 | 56.95 | 1889.50 | 0 |
2 | 3668-QPYBK | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 53.85 | 108.15 | 1 |
3 | 7795-CFOCW | 1 | 0 | 0 | 0 | 45 | 0 | 1 | 0 | 2 | 0 | 2 | 2 | 0 | 0 | 1 | 0 | 0 | 42.30 | 1840.75 | 0 |
4 | 9237-HQITU | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 70.70 | 151.65 | 1 |
4、相关性分析
相关性热力图
fig = plt.figure(figsize=(24,12),dpi=600)
ax = sns.heatmap(data.corr(), cmap="YlGnBu",
linecolor='black', lw=.65,annot=True, alpha=.95)
ax.set_xticklabels([x for x in data.drop('customerID',axis=1).columns])
ax.set_yticklabels([y for y in data.drop('customerID',axis=1).columns])
plt.show()
5、KM模型分析留存率
plt.figure(dpi=800)
kmf = KaplanMeierFitter()
kmf.fit(data['tenure'], event_observed=data['Churn'])
kmf.plot()
plt.title('Retain probability')
6、Cox风险回归模型预测用户流失趋势
# 分割训练集和测试集
train_data, test_data = train_test_split(data, test_size=0.2)
print([column for column in train_data])
['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']
构建Cox风险比例模型
formula ='gender+SeniorCitizen+Partner+Dependents+PhoneService+ \
MultipleLines+InternetService+OnlineSecurity+OnlineBackup+ \
DeviceProtection+TechSupport+StreamingTV+StreamingMovies+ \
Contract+PaperlessBilling+PaymentMethod+MonthlyCharges+TotalCharges'
model = CoxPHFitter(penalizer=0.01, l1_ratio=1)
model = model.fit(train_data.drop("customerID",axis=1), 'tenure', event_col='Churn',formula=formula)
model.print_summary()
model | lifelines.CoxPHFitter |
duration col | 'tenure' |
event col | 'Churn' |
penalizer | 0.01 |
l1 ratio | 1 |
baseline estimation | breslow |
number of observations | 5625 |
number of events observed | 1493 |
partial log-likelihood | -10197.70 |
time fit was run | 2022-10-28 01:18:04 UTC |
coef | exp(coef) | se(coef) | coef lower 95% | coef upper 95% | exp(coef) lower 95% | exp(coef) upper 95% | cmp to | z | p | -log2(p) | |
Contract | -1.49 | 0.23 | 0.08 | -1.64 | -1.33 | 0.19 | 0.26 | 0.00 | -18.76 | <0.005 | 258.51 |
Dependents | -0.10 | 0.91 | 0.08 | -0.25 | 0.05 | 0.78 | 1.05 | 0.00 | -1.27 | 0.21 | 2.28 |
DeviceProtection | -0.04 | 0.96 | 0.03 | -0.10 | 0.01 | 0.90 | 1.01 | 0.00 | -1.47 | 0.14 | 2.82 |
InternetService | -0.08 | 0.92 | 0.05 | -0.18 | 0.02 | 0.83 | 1.02 | 0.00 | -1.63 | 0.10 | 3.28 |
MonthlyCharges | 0.04 | 1.05 | 0.00 | 0.04 | 0.05 | 1.04 | 1.05 | 0.00 | 23.38 | <0.005 | 399.04 |
MultipleLines | -0.00 | 1.00 | 0.00 | -0.00 | 0.00 | 1.00 | 1.00 | 0.00 | -0.00 | 1.00 | 0.00 |
OnlineBackup | -0.09 | 0.91 | 0.03 | -0.15 | -0.04 | 0.86 | 0.97 | 0.00 | -3.14 | <0.005 | 9.20 |
OnlineSecurity | -0.19 | 0.83 | 0.04 | -0.26 | -0.12 | 0.77 | 0.89 | 0.00 | -5.16 | <0.005 | 21.97 |
PaperlessBilling | 0.09 | 1.09 | 0.06 | -0.04 | 0.21 | 0.96 | 1.23 | 0.00 | 1.38 | 0.17 | 2.57 |
Partner | -0.15 | 0.86 | 0.06 | -0.27 | -0.02 | 0.77 | 0.98 | 0.00 | -2.36 | 0.02 | 5.79 |
PaymentMethod | 0.16 | 1.18 | 0.03 | 0.10 | 0.22 | 1.11 | 1.25 | 0.00 | 5.49 | <0.005 | 24.58 |
PhoneService | -0.00 | 1.00 | 0.00 | -0.00 | 0.00 | 1.00 | 1.00 | 0.00 | -0.00 | 1.00 | 0.00 |
SeniorCitizen | 0.02 | 1.02 | 0.06 | -0.11 | 0.14 | 0.90 | 1.15 | 0.00 | 0.29 | 0.77 | 0.37 |
StreamingMovies | -0.02 | 0.98 | 0.03 | -0.08 | 0.04 | 0.92 | 1.04 | 0.00 | -0.66 | 0.51 | 0.97 |
StreamingTV | -0.02 | 0.98 | 0.03 | -0.08 | 0.04 | 0.92 | 1.04 | 0.00 | -0.60 | 0.55 | 0.87 |
TechSupport | -0.13 | 0.88 | 0.04 | -0.20 | -0.06 | 0.82 | 0.95 | 0.00 | -3.51 | <0.005 | 11.14 |
TotalCharges | -0.00 | 1.00 | 0.00 | -0.00 | -0.00 | 1.00 | 1.00 | 0.00 | -32.61 | <0.005 | 772.51 |
gender | -0.00 | 1.00 | 0.00 | -0.00 | 0.00 | 1.00 | 1.00 | 0.00 | -0.00 | 1.00 | 0.00 |
Concordance | 0.93 |
Partial AIC | 20431.41 |
log-likelihood ratio test | 3935.73 on 18 df |
-log2(p) of ll-ratio test | inf |
从结果上看,一致性指数(Concordance)为0.93,说明模型效果很好。
7、评估预测效果
一致性指数
plt.figure(figsize = (6,10),dpi=600)
model.plot(hazard_ratios=True)
plt.xlabel('Hazard Ratios (95% CI)')
plt.title('Hazard Ratios')
布里尔分数(Brier Score)
loss_dict = {}
for i in range(1,72):
score = brier_score_loss(
test_data['Churn'], 1-np.array(model.predict_survival_function(test_data).loc[i]), pos_label=1 )
loss_dict[i] = [score]
loss_df = pd.DataFrame(loss_dict).T
fig, ax = plt.subplots(dpi=600)
ax.plot(loss_df.index, loss_df)
ax.set(xlabel='Prediction Time', ylabel='Calibration Loss', title='Cox PH Model Calibration Loss / Time')
plt.show()
从图上看,模型对于预测40个月内的用户流失效果很好。
校准曲线(Calibration)
plt.figure(figsize=(10, 10),dpi=600)
ax = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
ax.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")
probs = 1-np.array(model.predict_survival_function(test_data).loc[7])
actual = test_data['Churn']
fraction_of_positives, mean_predicted_value = calibration_curve(actual, probs, n_bins=10, normalize=False)
ax.plot(mean_predicted_value, fraction_of_positives, "s-", label="%s" % ("CoxPH",))
ax.set_ylabel("Fraction of positives")
ax.set_ylim([-0.05, 1.05])
ax.legend(loc="lower right")
ax.set_title('Calibration plots (reliability curve)')
从图上看,模型低估了用户留存率,即高估了流失率。
8、预测抽样用户流失
nochurn_data=test_data.loc[test_data['Churn']==0]
churn_clients = pd.DataFrame(model.predict_survival_function(nochurn_data))
churn_clients
3943 | 496 | 2618 | 6676 | 1311 | 5387 | 3015 | 2080 | 5445 | 4095 | 2928 | 6376 | 5230 | 870 | 5422 | ... | 2149 | 1380 | 501 | 4210 | 2110 | 6050 | 2937 | 5771 | 5022 | 190 | 1188 | 5236 | 5974 | 1668 | 1312 | |
1.00 | 0.91 | 0.99 | 1.00 | 1.00 | 1.00 | 0.96 | 1.00 | 1.00 | 1.00 | 0.77 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ... | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 |
2.00 | 0.87 | 0.99 | 1.00 | 1.00 | 1.00 | 0.95 | 1.00 | 1.00 | 1.00 | 0.68 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ... | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 |
3.00 | 0.84 | 0.98 | 1.00 | 1.00 | 1.00 | 0.93 | 1.00 | 1.00 | 1.00 | 0.62 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ... | 1.00 | 1.00 | 1.00 | 1.00 | 0.97 | 1.00 | 1.00 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 |
4.00 | 0.80 | 0.98 | 1.00 | 1.00 | 1.00 | 0.92 | 1.00 | 1.00 | 1.00 | 0.55 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ... | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 | 1.00 | 1.00 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 |
5.00 | 0.77 | 0.97 | 1.00 | 0.99 | 1.00 | 0.91 | 1.00 | 1.00 | 1.00 | 0.49 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ... | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 | 1.00 | 1.00 | 0.96 | 1.00 | 1.00 | 0.99 | 1.00 | 0.97 | 1.00 | 1.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
68.00 | 0.00 | 0.00 | 0.06 | 0.03 | 0.20 | 0.00 | 0.97 | 0.99 | 0.96 | 0.00 | 1.00 | 0.80 | 0.97 | 1.00 | 0.98 | ... | 0.32 | 0.94 | 0.98 | 0.60 | 0.00 | 0.74 | 0.90 | 0.00 | 1.00 | 0.41 | 0.03 | 0.20 | 0.00 | 0.09 | 0.59 |
69.00 | 0.00 | 0.00 | 0.04 | 0.02 | 0.16 | 0.00 | 0.97 | 0.99 | 0.95 | 0.00 | 1.00 | 0.78 | 0.97 | 1.00 | 0.98 | ... | 0.28 | 0.93 | 0.98 | 0.56 | 0.00 | 0.71 | 0.89 | 0.00 | 1.00 | 0.36 | 0.02 | 0.16 | 0.00 | 0.07 | 0.55 |
70.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.09 | 0.00 | 0.96 | 0.98 | 0.94 | 0.00 | 0.99 | 0.71 | 0.96 | 0.99 | 0.97 | ... | 0.18 | 0.91 | 0.97 | 0.46 | 0.00 | 0.64 | 0.86 | 0.00 | 1.00 | 0.26 | 0.01 | 0.08 | 0.00 | 0.03 | 0.45 |
71.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.05 | 0.00 | 0.95 | 0.98 | 0.93 | 0.00 | 0.99 | 0.67 | 0.95 | 0.99 | 0.96 | ... | 0.13 | 0.89 | 0.97 | 0.40 | 0.00 | 0.58 | 0.83 | 0.00 | 1.00 | 0.20 | 0.00 | 0.05 | 0.00 | 0.01 | 0.39 |
72.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.00 | 0.95 | 0.98 | 0.92 | 0.00 | 0.99 | 0.63 | 0.95 | 0.99 | 0.96 | ... | 0.10 | 0.88 | 0.96 | 0.35 | 0.00 | 0.54 | 0.81 | 0.00 | 1.00 | 0.16 | 0.00 | 0.03 | 0.00 | 0.01 | 0.34 |
72 rows × 1031 columns
plt.figure(figsize=(10, 10),dpi=600)
churn_clients[churn_clients.columns[0]].plot(color='c')
churn_clients[churn_clients.columns[1]].plot(color='y')
churn_clients[churn_clients.columns[21]].plot(color='m')
churn_clients[1311].plot(color='g')
plt.plot([i for i in range(0,20)],[0.5 for i in range(0,20)],'k--', label='Threshold=0.5')
plt.ylim(0,1)
plt.xlim(0,72)
plt.xlabel('Timeline')
plt.ylabel('Retain probability')
plt.legend(loc='best')
plt.title('The Churn Trend of Samples')
从图上看,序号为6109的用户在第10个月的时候,留存率开始低于50%,在第30个月和第40个月之间流失。
七、分层召回用户
基于上述分析,给出对于召回用户的如下建议:
1)根据用户流失风险等级划分用户,对于不同流失风险等级的客户,采用不用运营策略;
2)根据流失预测模型,要在合适的时间点进行干预,以最小的成本留住客户。
3)优化建议:提取其他数据,如用户消费数据,对用户价值分层,进一步精细化管理用户。