(Background: LSTMs vs. CNNs)

An LSTM (long-short term memory network) is a type of recurrent neural network that allows for the accounting of sequential dependencies in a time series.

LSTM(长期短期记忆网络)是一种递归神经网络,可以考虑时间序列中的顺序依存关系。

Given that correlations exist between observations in a given time series (a phenomenon known as autocorrelation), a standard neural network would treat all observations as independent, which is erroneous and would generate misleading results.

假设给定时间序列中的观测值之间存在相关性(一种称为自相关的现象),则标准神经网络会将所有观测值视为独立的,这是错误的,并且会产生误导性的结果。

A convolutional neural network is one that applies a process known as convolution in determining the relationships between two functions. e.g. given two functions f and g, the convolution integal expresses how the shape of one function is modified by the other. Such networks are traditionally used for image classification, and do not account for sequential dependencies in the way that a recurrent neural network is able to do.

卷积神经网络是在确定两个函数之间的关系时应用称为卷积的过程的网络。 例如给定的两种功能FA ND g,则卷积integal表示如何一个函数的形状是由其他改性。 传统上,此类网络用于图像分类,并且不像递归神经网络能够做到的那样考虑顺序依赖性。

However, the main advantage of CNNs that make them suited to forecasting time series is that of dilated convolutions - or the ability to use filters to compute dilations between each cell. That is to say, the size of the space between each cell, which in turn allows the neural network to better understand the relationships between the different observations in the time series.

但是,使CNN适于预测时间序列的主要优点是膨胀卷积的优点 -或使用过滤器计算每个像元之间的膨胀的能力。 也就是说,每个单元格之间的空间大小可以使神经网络更好地理解时间序列中不同观测值之间的关系。

For this reason, LSTM and CNN layers are often combined when forecasting a time series. This allows for the LSTM layer to account for sequential dependencies in the time series, while the CNN layer further informs this process through the use of dilated convolutions.

因此,在预测时间序列时,通常会合并LSTM和CNN层。 这允许LSTM层考虑时间序列中的顺序依赖性,而CNN层则通过使用膨胀卷积进一步通知此过程。

With that being said, standalone CNNs are increasingly being used for time series forecasting, and the combination of several Conv1D layers can actually produce quite impressive results — rivalling that of a model which uses both CNN and LSTM layers.

话虽这么说,独立的CNN越来越多地用于时间序列预测,并且多个Conv1D层的组合实际上可以产生令人印象深刻的结果-与使用CNN和LSTM层的模型相媲美。

How is this possible? Let’s find out!

这怎么可能? 让我们找出答案!

The below example was designed using a CNN template from the Intro to TensorFlow for Deep Learning course from Udacity — this particular topic is found in Lesson 8: Time Series Forecasting by Aurélien Géron.

下面的示例是使用UdacityIntro to TensorFlow深度学习课程中的CNN模板设计的—该特殊主题可以在AurélienGéron的第8课:时间序列预测中找到。

(Our Time Series Problem)

The below analysis is based on data from Antonio, Almeida and Nunes (2019): Hotel booking demand datasets.

以下分析基于Antonio,Almeida和Nunes(2019)的数据:酒店预订需求数据集

Imagine this scenario. A hotel is having difficulty in forecasting hotel booking cancellations on a day-to-day basis. This is leading to difficulty in forecasting revenues and also in the efficient allocation of hotel rooms.

想象一下这种情况。 旅馆每天都难以预测旅馆预订的取消情况。 这导致难以预测收入以及有效分配酒店房间。

The hotel would like to solve this problem by building a time series model that can forecast the fluctuations in daily hotel cancellations with reasonably high accuracy.

饭店希望通过建立一个时间序列模型来解决此问题,该模型可以以相当高的准确性预测每日饭店取消的波动。

Here is a time series plot of the fluctuations in daily hotel cancellation bookings:

这是每日酒店取消预订波动的时间序列图:




cnn用于房价预测深 cnn lstm预测_机器学习


Source: Jupyter Notebook Output 资料来源:Jupyter Notebook输出

(Model Configuration)

The neural network is structured as follows:

神经网络的结构如下:


cnn用于房价预测深 cnn lstm预测_cnn用于房价预测深_02

Source: Image Created By Author

资料来源:作者创作的图片

Here are the important model parameters that must be accounted for.

这是必须考虑的重要模型参数。

(Kernel Size)

The kernel size is set to 3, meaning that each output is calculated based on the previous three time steps.

内核大小设置为3,这意味着每个输出都是基于前三个时间步长计算的。

Here is a rough illustration:

这是一个大概的例子:


cnn用于房价预测深 cnn lstm预测_ide_03

Source: Image Created by Author. Template adopted from Udacity — Intro to TensorFlow for Deep Learning: Time Series Forecasting

来源:作者创建的图像。 Udacity采用的模板-TensorFlow简介以进行深度学习:时间序列预测

Setting the correct kernel size is a matter of experimentation, as a low kernel size risks poor model performance, while a high kernel size risks overfitting.

设置正确的内核大小是一个实验问题,因为低内核大小可能会带来不良的模型性能,而高内核大小可能会导致过度拟合。

As can be seen from the diagram, three input time steps are taken and used to generate a separate output.

从该图可以看出,采取了三个输入时间步长并用于生成单独的输出。

(Padding)

In this instance, causal padding is used in order to ensure that the output sequence has the same length as the input sequence. In other words, this ensures that the network “pads” time steps from the left side of the series in order to ensure that future values on the right side of the series are not being used in generating the forecast — this will quite obviously lead to false results and we will end up overestimating the accuracy of our model.

在这种情况下,使用因果填充以确保输出序列与输入序列具有相同的长度。 换句话说,这可以确保网络从序列的左侧“填充”时间步长,以确保在生成预测时不使用序列右侧的未来值,这显然会导致错误的结果,我们最终会高估模型的准确性。

(Strides)

The stride length is set to one, which means that the filter slides forward by one time step at a time when forecasting future values.

步长设置为1,这意味着在预测未来值时,过滤器每次向前滑动一个时间步长。

However, this could be set higher. For instance, setting the stride length to two would mean that the output sequence would be approximately half the length of the input sequence.

但是,可以将其设置得更高。 例如,将步幅长度设置为2意味着输出序列将约为输入序列长度的一半。

A long stride length would mean that the model might potentially discard valuable data in generating the forecast, but increasing the stride length can be useful when it comes to capturing longer-term trends and smoothing out noise in the series.

较长的步幅将意味着该模型可能会在生成预测时丢弃有价值的数据,但是如果要捕获长期趋势并消除序列中的噪声,则增加步幅会很有用。

Here is the model configuration:

这是模型配置:

model = keras.models.Sequential([
  keras.layers.Conv1D(filters=32, kernel_size=3,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  keras.layers.LSTM(32, return_sequences=True),
  keras.layers.Dense(1),
  keras.layers.Lambda(lambda x: x * 200)
])
lr_schedule = keras.callbacks.LearningRateScheduler(
    lambda epoch: 1e-8 * 10**(epoch / 20))
optimizer = keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss=keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

(Results)

Firstly, let’s make forecasts using the above model on different window sizes.

首先,让我们使用上述模型对不同的窗口大小进行预测。

It is important that the window size is large enough to account for the volatility across time steps.

重要的是,窗口大小必须足够大以考虑跨时间步长的波动性。

Suppose we start with a window size of 5.

假设我们从5的窗口大小开始。

(window_size = 5)

The training loss is as follows:

训练损失如下:

plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30])


cnn用于房价预测深 cnn lstm预测_时间序列_04

Source: Jupyter Notebook Output 资料来源:Jupyter Notebook输出


Here is a visual of the forecasts versus actual daily cancellation values:

这是预测与实际每日取消值的对比:

rnn_forecast = model_forecast(model, series[:,  np.newaxis], window_size)
rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, rnn_forecast)


Source: Jupyter Notebook Output 资料来源:Jupyter Notebook输出

The mean absolute error is calculated:

平均绝对误差计算如下:

>>> keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()
9.113908

With a mean of 19.89 across the validation set, the model accuracy is reasonable. However, we do see from the diagram above that the model falls short in terms of forecasting more extreme values.

整个验证集的平均值为19.89,模型准确性是合理的。 但是,从上图确实可以看出,该模型在预测更多极端值方面是不足的。

(window_size = 30)

What if the window size was increased to 30?

如果窗口大小增加到30,该怎么办?

The mean absolute error decreases slightly:

平均绝对误差略有下降:

>>> keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()
7.377962

As mentioned, the stride length can be set higher if we wish to smooth out the forecast — with the caveat that such a forecast (the output sequence) will have less data points than that of the input sequence.

如前所述,如果我们希望对预测进行平滑处理,则可以将步长设置得更高一些,但要注意的是,这种预测(输出序列)的数据点将少于输入序列的数据点。

(Forecasting without LSTM layer)

Unlike an LSTM, a CNN is not recurrent, which means that it does not retain memory of previous time series patterns. Instead, it can only train based on the data that is inputted by the model at a particular time step.

与LSTM不同,CNN不会重复出现,这意味着它不会保留先前时间序列模式的记忆。 相反,它只能基于模型在特定时间步长输入的数据进行训练。

However, by stacking several Conv1D layers together, it is in fact possible for a convolutional neural network to effectively learn long-term dependencies in the time series.

但是,通过将几个Conv1D层堆叠在一起,卷积神经网络实际上可以有效地学习时间序列中的长期依存关系。

This can be done using a WaveNet architecture. Essentially, this means that the model defines every layer as a 1D convolutional layer with a stride length of 1 and a kernel size of 2. The second convolutional layer uses a dilation rate of 2, which means that every second input timestep in the series is skipped. The third layer uses a dilation rate of 4, the fourth layer uses a dilation rate of 8, and so on.

这可以使用WaveNet体系结构来完成。 本质上,这意味着该模型将每一层定义为步长为1且内核大小为2的一维卷积层。第二个卷积层使用了2的膨胀率,这意味着该系列中的每个第二输入时间步长为跳过。 第三层使用4的膨胀率,第四层使用8的膨胀率,依此类推。

The reason for this is that it allows the lower layers to learn short-term patterns in the time series, while the higher layers learn longer-term patterns.

这样做的原因是,它允许较低的层学习时间序列中的短期模式,而较高的层则学习较长时间的模式。

The WaveNet model is defined as follows:

WaveNet模型的定义如下:

model = keras.models.Sequential()
model.add(keras.layers.InputLayer(input_shape=[None, 1]))
for dilation_rate in (1, 2, 4, 8, 16, 32):
    model.add(
      keras.layers.Conv1D(filters=32,
                          kernel_size=2,
                          strides=1,
                          dilation_rate=dilation_rate,
                          padding="causal",
                          activation="relu")
    )
model.add(keras.layers.Conv1D(filters=1, kernel_size=1))
optimizer = keras.optimizers.Adam(lr=3e-4)
model.compile(loss=keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])model_checkpoint = keras.callbacks.ModelCheckpoint(
    "my_checkpoint.h6", save_best_only=True)
early_stopping = keras.callbacks.EarlyStopping(patience=50)
history = model.fit(train_set, epochs=500,
                    validation_data=valid_set,
                    callbacks=[early_stopping, model_checkpoint])

A window size of 64 is used in training the model. In this instance, we are using a larger window size than was used with the CNN-LSTM model, in order to ensure that the CNN model picks up longer-term dependencies.

训练模型使用的窗口大小为64。 在这种情况下,我们使用的窗口尺寸要大于CNN-LSTM模型所用的窗口尺寸,以确保CNN模型能够获得较长期的依赖性。

Note that early stopping is used when training the neural network. The purpose of this is to ensure that the neural network halts training at the point where further training would result in overfitting. Determining this manually is quite an arbitrary process, so early stopping can greatly assist with this.

注意提前停止 在训练神经网络时使用。 这样做的目的是确保神经网络在进一步训练会导致过度拟合的点停止训练。 手动确定此过程是一个任意过程,因此尽早停止可对此提供很大帮助。

Let’s now generate forecasts using the standalone CNN model that we just built.

现在,让我们使用刚刚构建的独立CNN模型来生成预测。

cnn_forecast = model_forecast(model, series[..., np.newaxis], window_size)
cnn_forecast = cnn_forecast[split_time - window_size:-1, -1, 0]

Here is a plot of the forecasted vs. actual data.

这是预测数据与实际数据的曲线图。


Source: Jupyter Notebook Output 资料来源:Jupyter Notebook输出

The mean absolute error came in slightly higher at 7.49.

平均绝对误差略高于7.49。

Note that for both models, the Huber loss was used as the loss function. This type of loss tends to be more robust to outliers, in that it is quadratic for smaller errors and linear for larger ones.

请注意,对于这两种模型,都将Huber损耗用作损耗函数。 这种类型的损失倾向于对异常值更健壮,因为对于较小的误差它是二次方的,对于较大的误差是线性的。

This type of loss is suitable for this scenario, as we can see that some outliers are present in the data. Using MSE (mean squared error) would overly inflate the forecast error yielded by the model, whereas MAE on its own would likely underestimate the size of the error by not taking such outliers into account. The use of a Huber loss function allows for a happy medium.

这种类型的丢失适用于这种情况,因为我们可以看到数据中存在一些异常值。 使用MSE(均方误差)会过分夸大模型产生的预测误差,而MAE本身可能会由于不考虑这些离群值而低估了误差的大小。 使用Huber损失函数可得出满意的结果。

>>> keras.metrics.mean_absolute_error(x_valid, cnn_forecast).numpy()
7.490844

Even with a slightly higher MAE, the CNN model has performed quite well in forecasting daily hotel cancellations, without having to be combined with an LSTM layer in order to learn long-term dependencies.

即使具有较高的MAE,CNN模型在预测酒店的每日取消中也表现良好,而无需与LSTM层组合即可了解长期依赖关系。

(Conclusion)

In this example, we have seen:

在此示例中,我们看到了:

  • The similarities and differences between CNNs and LSTMs in forecasting time series
  • How dilated convolutions assist CNNs in forecasting time series
  • Modification of kernel size, padding and strides in forecasting a time series with CNN
  • Use of a WaveNet architecture to conduct a time series forecast using stand-alone CNN layers

In particular, we saw how a CNN can produce similarly strong results compared to a CNN-LSTM model through the use of dilation.

特别是,我们看到了CNN通过使用扩张方法与CNN-LSTM模型相比可以产生相似的结果。

Many thanks for your time, and any questions, suggestions or feedback are greatly appreciated.

非常感谢您的宝贵时间,对于任何问题,建议或反馈,我们深表感谢。

As mentioned, this topic is also covered in the Intro to TensorFlow for Deep Learning course from Udacity course — I highly recommend the chapter on Time Series Forecasting for further detail on this topic.


You can also find the full Jupyter Notebook that I used for running this example on hotel cancellations here.

你也可以找到完整的Jupyter笔记本电脑,我用于运行在酒店取消这个例子在这里

The original Jupyter Notebook (Copyright 2018, The TensorFlow Authors) can also be found here.

也可以在此处找到原始的Jupyter Notebook(版权所有2018,TensorFlow Authors)。


翻译自: https://towardsdatascience.com/cnn-lstm-predicting-daily-hotel-cancellations-e1c75697f124

cnn lstm预测