python预测时间序列



Prophet is an open source time series forecasting algorithm designed by Facebook for ease of use without any expert knowledge in statistics or time series forecasting. Prophet builds a model by finding a best smooth line which can be represented as a sum of the following components:

Prophet是Facebook设计的一种开放源代码时间序列预测算法,易于使用,无需任何统计或时间序列预测方面的专业知识。 先知通过找到一条最佳的平滑线来构建模型,该线可以表示为以下各部分总和:

y(t) = g(t) + s(t) + h(t) + ϵₜ

y(t)= g(t)+ s(t)+ h(t)+ ϵₜ

  • Overall growth trend. g(t)
  • Yearly seasonality. s(t)
  • Weekly seasonality. s(t)
  • Holidays effects h(t)

In this series of blog posts, we will see some of the useful functions present in the library fbprophet listed below with an example.

在这一系列博客文章中,我们将看到库fbprophet存在的一些有用功能 下面列出了一个示例。

  1. Prophet.fit Prophet.fit
  2. Prophet.predict Prophet.predict
  3. Prophet.plot Prophet.plot
  4. Prophet.plot_components Prophet.plot_components
  5. Prophet.add_seasonality Prophet.add_seasonality
  6. Prophet.add_regressors Prophet.add_regressors
  7. Prophet.seasonalities Prophet.seasonalities
  8. Prophet.predictive_samples Prophet.predictive_samples

Let’s start by describing the sample data set that we will be using for our demonstration.

让我们开始描述将用于演示的样本数据集。

(Data Description)

We will be using a synthetic daily time series data(shown below) with columns (date, target, regr1, regr2) for 180 days, where target is a value which we want to be predicted for each day and regr1, regr2 are external factor which effect the target value.

我们将使用180天的列( datetargetregr1regr2 )的合成每日时间序列数据(如下所示),其中target是我们希望每天预测的值,而regr1regr2是外部因素这会影响目标值。

Let's see the data visually by plotting the target and regressor columns.

让我们通过绘制目标列和回归列来直观地查看数据。

# Importing Libraries
import pandas as pd
# loading the time series data into a dataframe
df = pd.read_csv('ts_with_2regressors.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# plotting the time series data
df.plot(x='date', y='target', figsize=(20, 5), title='Time series Data')





Sample Time Series Data 采样时间序列数据

# plotting the regressors
ax = df.plot(x='date', y='target', figsize=(20, 5), title='Regressors Effect')ticks, _ = plt.xticks()y_min = df.target.min()
y_max = df.target.max()plt.vlines(x=list(df[df['regr1'] == 1]['date'].values), ymin=y_min, ymax=y_max, colors='purple', ls='--', lw=2, label='regr1')plt.vlines(x=list(df[df['regr2'] == 1]['date'].values), ymin=y_min, ymax=y_max, colors='green', ls=':', lw=2, label='regr2')plt.legend(bbox_to_anchor=(1.04, 0.5), loc="center left")plt.show()


bp预测模型python实例 python预测分析_python

Regressors Effect

回归效应

We can see that the spikes in the time series data are due to the regressors (regr1, regr2). We will see how to capture and model these regressors in the coming sections.

我们可以看到时间序列数据中的峰值是由于回归变量( regr1regr2 )引起的。 在接下来的部分中,我们将看到如何捕获和建模这些回归变量。

先知的安装: (Installation of Prophet:)

As with every python library you can install fbprophet using pip. The major dependency that Prophet has is pystan.

与每个python库一样,您可以使用pip安装fbprophet 。 先知的主要依赖者是pystan

# Install pystan with pip before using pip to install fbprophetpip install pystan
pip install fbprophet

Let us now see how to use the above functions:

现在让我们看看如何使用以上功能:

(Generating a Forecast:)

  • Prophet follows the sklearn model API. We create an instance of the Prophet class and then call its fit(Prophet.fit) and predict(Prophet.predict) methods.
    先知遵循sklearn模型API。 我们创建Prophet类的实例,然后调用其fit( Prophet.fit )和predict( Prophet.predict )方法。
  • The input to Prophet is always a data frame with two columns: ds and y. The ds (date stamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a time stamp. The y column must be numeric, and represents the measurement we wish to forecast.
    先知的输入始终是具有两列的数据框: dsy 。 的ds (日期戳)列应该是由熊猫预期的格式的,理想地YYYY-MM-DD的日期或YYYY-MM-DD HH:MM:SS的时间印记。 y列必须是数字,代表我们希望预测的度量。
  • For demonstration we will use the target values for first 150 days as training data and predict target for all 180 days.
    为了演示,我们将 前150天 target 用作训练数据并预测所有180天的目标。

Note : For this step we will be considering only the date and target columns

注意 :在此步骤中,我们将仅考虑datetarget

# Creating train and predict dataframe
df = df.rename(columns={'date':'ds', 'target':'y'})
df_train = df[['ds', 'y']].iloc[:150]
df_predict = df[['ds']]
# Fitting a Prophet model
model = Prophet()
model.fit(df_train)
forecast = model.predict(df_predict)forecast.head()


bp预测模型python实例 python预测分析_机器学习_02

Forecast Head GIF

预测头GIF

# plotting the actual and forecast values
ax = (df.plot(x='ds',y='y',figsize=(20,5),title='Actual Vs Forecast'))
forecast.plot(x='ds',y='yhat',figsize=(20,5),title='Actual vs Forecast', ax=ax)


Actual vs forecast 实际与预测


From the above output we can see that prophet pretty much gave a good fit on the data, but still it is unable to capture the sudden jumps in the data. These jumps are basically caused by the external regressors which prophet is unable to detect by default. We will see how to model Prophet to capture these external factors in the coming sections.

从上面的输出中,我们可以看到先知非常适合数据,但是仍然无法捕获数据中的突然跳跃。 这些跳跃基本上是由先知默认情况下无法检测到的外部回归器引起的。 在接下来的部分中,我们将看到如何为先知建模以捕获这些外部因素。

(Plotting the Forecast :)

  • We can plot the forecast and the components by calling the Prophet.plot & Prophet.plot_components method and passing in the forecast dataframe as shown below
    我们可以通过调用Prophet.plotProphet.plot_components方法并传递预测数据帧来绘制预测和组成部分,如下所示
  • The forecast plot is a single graph containing a scatter plot of historical data points indicated by black dots and the forecast/fitted curve indicated by a blue line. The graph also contains a light blue shaded region which corresponds to the uncertainty bands.
  • The components plot is a group of plots corresponding to various time series components(trend, seasoanilities) and external effects.
    成分图是一组与各种时间序列成分( trendseasoanilities和外部影响)相对应的图。
# Plotting the generated forecast
fig1 = model.plot(forecast, uncertainty=True)


bp预测模型python实例 python预测分析_算法_03

Forecast Output Plot 预测输出图

# Plotting the forecast components.
fig2 = model.plot_components(forecast)


bp预测模型python实例 python预测分析_机器学习_04

Forecast Component Plot 预测分量图

As mentioned in the starting Prophet estimates the trend and weekly_seasonality based on the training data.

weekly_seasonality所述, Prophet根据训练数据估算trendweekly_seasonality

Let us now understand the above 2 Plots:

现在让我们了解以上2个图:

预测输出图: (Forecast Output Plot:)

  • X-axis represents the date values (ds ) for both history and future dates.
    X轴代表历史日期和将来日期的日期值( ds )
  • Y-axis represents the target values(y, yhat)for both history and future dates.
    Y轴代表历史和未来日期的目标值( yyhat )。
  • In the graph the black dotted points represent the historical training data points.
    在图中, black dotted points表示历史训练数据点。
  • The blue line represents the forecasts generated for both history and future.
    blue line 代表针对历史和未来生成的预测。
  • Along with there is light blue region which represents the uncertainty bands(We will see more about this in the coming sections.)
    伴随着light blue region 它代表了不确定性范围(我们将在接下来的部分中对此进行更多了解。)

(Forecast Component Plot:)

  • X-axis represents the date values (ds ) for both history and future dates.
    X轴代表历史日期和将来日期的日期值( ds )
  • Y-axis represents the prophet estimate for respective forecast compoent (trend, seasonality)
    Y轴代表各个预测成分( trendseasonality )的先知估计
  • Graph1: trend value for all dates(history and future).
    图1: trend 所有日期(历史和未来)的值。
  • Graph2: weekly_seasonality a weekly profile for each day in a week based on the training data.
    图2: weekly_seasonality 基于训练数据的一周中每一天的每周资料。

As we can see it is very easy to start and get a reasonable forecast model on your time series data using prophet.

正如我们所看到的,使用先知可以很容易地开始并对您的时间序列数据获取合理的预测模型。

(Adding Custom Seasonalities)

  • In Prophet we can model custom seasonalities using the functions Prophet.add_seasonality.
    在Prophet中,我们可以使用Prophet.add_seasonality函数Prophet.add_seasonality自定义的季节性Prophet.add_seasonality
  • By default Prophet automatically models a additive daily, weekly, yearly seasonalities based on available training data.
    默认情况下,Prophet根据可用的训练数据自动对添加剂的dailyweeklyyearly季节性建模。
  • We can get the details of inferred seasonalities using the function Prophet.seasonalities 我们可以使用函数Prophet.seasonalities获取推断的季节性的详细信息。
  • Let us now use the above method to model a monthly seasonality.
    现在,让我们使用上述方法来模拟monthly季节性。
# Modelling a custom monthly seasonality
model2 = Prophet()
model2.add_seasonality(name='custom_monthly', period=30.5, fourier_order=10)model2.fit(df_train)
forecast2 = model2.predict(df_predict)print(model2.seasonalities)


bp预测模型python实例 python预测分析_机器学习_05

fig1 = model2.plot(forecast2, uncertainty=True)


bp预测模型python实例 python预测分析_python_06

fig2 = model2.plot_components(forecast2, uncertainty=True)


From the above graph We can see that the Prophet has modeled a custom_monthly seasonality and the forecast is also a bit modified compared to the default forecast.

从上图可以看出,先知已对custom_monthly季节变化建模,与默认预测相比,该预测也有所修改。

(Adding External Regressors)

Till now the prophet model is not able to model some of points in the training data. We know that these values deviate from the regular value because of the external regressors (regr1, regr2).

直到现在,先知模型仍无法对训练数据中的某些点建模。 我们知道,由于外部回归变量( regr1regr2 ),这些值与常规值regr1 regr2

let us now see how to capture these values and model them.

现在让我们看看如何捕获这些值并对其建模。

  • Similar to seasonalities prophet also has a way to capture/model external factors which have an effect on the target value using the function Prophet.add_regressors.
    与季节性相似,先知还可以使用功能Prophet.add_regressors捕获/建模对目标值有影响的外部因素。
  • In the sample data we are using we have mentioned that there are two external regressors which are effecting the target value.
  • In order to model and predict the regressor effects, both the training and prediction dataframes should contain the regressor data.
  • Let us now see how to model these regressors using the above function.

Note : The regressor’s should be numeric values, you will have to perform one hot encoding if the regressor contains string data.

注意 :回归变量应为数字值,如果回归变量包含字符串数据,则必须执行一种热编码。

# adding regressor data in historical and future datesdf_train3 = (df[['ds', 'y', 'regr1', 'regr2']]
             .iloc[:150]
             .copy())
df_predict3 = df[['ds', 'regr1', 'regr2']].copy()# modelling external regressors prior to model fittingmodel3 = Prophet()
model3.add_regressor('regr1')
model3.add_regressor('regr2')
# fit and predcit
model3.fit(df_train3)
forecast3 = model3.predict(df_predict3)# Plot the forecast
fig1 = model3.plot(forecast3, uncertainty=True)


bp预测模型python实例 python预测分析_人工智能_07

Prophet model with external regressors 具有外部回归的先知模型

# plot model components
fig2 = model3.plot_components(forecast3, uncertainty=True)


bp预测模型python实例 python预测分析_机器学习_08

Prophet Model Components with external regressors 具有外部回归的先知模型组件

From the above output we can observe that the model has pretty much captured the two external effects, from the ouptut graph we can that there is approx 5% and 20% lift on the target value w.r.t the regressors.

从上面的输出中,我们可以观察到该模型几乎捕获了两个外部影响,从输出图中可以看出,回归值对目标值的提升约为5%和20%。

(Prophet Posterior Samples)

  • By default prophet generates 1000 posterior samples for each day inorder to estimating the upper and lower bound’s of the uncertainity bands. On a given day the mean of the posterior samples is almost equal to the forecast value yhat 默认情况下,先知每天会生成1000个后验样本,以便估算不确定带的上限和下限。 在给定的一天,后验样本的平均值几乎等于预测值yhat
  • Prophet has a way to access the posterior samples for a particular day both in history and future using the function Prophet.predictive_samples 先知可以使用功能Prophet.predictive_samples来访问历史上和将来特定日期的后验样本。
  • We can modify the no of samples at the time of prophet instantiation using the parameter uncertainty_samples 我们可以使用参数uncertainty_samples样本来修改先知实例化时的样本数
  • let us now generate a data frame with posterior samples for 1 week in the prediction data frame.
# Select one week from prediction df
df_1wk = df_predict3.iloc[:7]
df_1wk


bp预测模型python实例 python预测分析_python_09

# fetching the posterior samples.
samples = model3.predictive_samples(df_1wk)df_samples = pd.DataFrame(data=samples['yhat'], index=df_1wk['ds']).reset_index()df_samples


bp预测模型python实例 python预测分析_bp预测模型python实例_10

Let us now compute the mean of posterior samples for one day and compare it with that day’s forecast.

现在让我们计算一天后验样本的平均值,并将其与当天的预测值进行比较。

# Forecast 
forecast3[forecast3['ds'] == '2018-09-02'][['ds', 'yhat']]


bp预测模型python实例 python预测分析_人工智能_11

# Mean of the posterior samples.
df_samples[df_samples['ds'] == '2018-09-02'].set_index('ds').mean(axis=1)


We can see that the mean of the posterior samples is almost equal to the forecast value for a particular day.

我们可以看到,后验样本的平均值几乎等于特定日期的预测值。

(Conclusion:)

This brings us to the end of our demonstration. Hope you find the post informative. You can find the data and Jupyter notebook used in the below repository:

这使我们结束了演示。 希望您能找到有益的信息。 您可以在以下存储库中找到使用的数据和Jupyter笔记本:



Do try modelling your time series data using prophet and share your thoughts in the comments section.

请尝试使用先知为您的时间序列数据建模,并在评论部分分享您的想法。