Businesses in almost every industry are adopting Machine Learning (ML) technology. Businesses look towards ML Infrastructure platforms to help them best leverage artificial intelligence (AI).

几乎每个行业的企业都在采用机器学习(ML)技术。 企业希望ML基础架构平台能够帮助他们最佳地利用人工智能(AI)。

Understanding the various platforms and offerings can be a challenge. The ML Infrastructure space is crowded, confusing, and complex. There are a number of platforms and tools, which each have a variety of functions across the model building workflow.

了解各种平台和产品可能是一个挑战。 ML基础结构空间拥挤,混乱且复杂。 有许多平台和工具,每个平台和工具在整个模型构建工作流程中都具有多种功能。

To understand the ML infrastructure ecosystem, we can broadly segment the machine learning workflow into three stages — data preparation, model building, and production. Data preparation refers to the processing and augmentation of data for use by models, while model building refers to a model built based on the data, and production integrates model predictions into the business.

为了了解ML基础架构生态系统,我们可以将机器学习工作流程大致分为三个阶段-数据准备,模型构建和生产。 数据准备是指处理和扩充供模型使用的数据,而模型构建是指基于数据构建的模型,而生产将模型预测集成到业务中。

Understanding the goals and challenges of each stage of the workflow can help you make the most informed decision on what ML Infrastructure platforms out there are best suited for your business’ needs.

了解工作流各个阶段的目标和挑战,可以帮助您做出最明智的决定,从而确定最适合您业务需求的ML基础架构平台。

MLPRegressor原理详解_大数据

ML Workflow Stages Diagram by Author ML工作流程阶段图(按作者)

Each of these stages of the Machine Learning workflow (Data Preparation, Model Building, and Production) have a number of vertical functions. Some platforms cover all three functions across the ML workflow, while other platforms focus on single functions (for example, experiment tracking or hyperparameter tuning).

机器学习工作流程的每个阶段(数据准备,模型构建和生产)都有许多垂直功能。 一些平台涵盖了ML工作流程中的所有三个功能,而其他平台则专注于单个功能(例如,实验跟踪或超参数调整)。

In our previous posts, we examined the Data Preparation and Model Building parts of the ML workflow. We began diving into Production ML and discussed model validation at length. In this post, we will dive deeper into Production and focus on model deployment and serving.

在之前的文章中,我们检查了ML工作流程的数据准备和模型构建部分。 我们开始深入研究Production ML,并详细讨论了模型验证。 在本文中,我们将更深入地研究生产,并专注于模型部署和服务。

(Model Deployment and Serving)

Once the model has been trained, built, and validated, it is finally time to deploy and serve the model! In this last step of ML, all of the work of the previous steps are finally put to use by a data-driven model.

一旦对模型进行了训练,构建和验证,终于可以开始部署和服务模型了! 在ML的最后一步,数据驱动模型最终使用了先前步骤的所有工作。

The first decision that teams need to make is whether they should even build a model server at all. Most models deployed in the last five years were home-built serving approaches. In recent years, however, companies working with ML models have moved away from building everything from scratch. In fact, we predict that the approach of building everything from scratch will change drastically going forward, given the number of model servers coming to market.

团队需要做出的第一个决定是他们是否应该甚至完全构建模型服务器。 过去五年中部署的大多数模型都是自制的服务方法。 但是,近年来,使用ML模型的公司已不再从头开始构建所有内容。 实际上,我们预测,鉴于要投放市场的模型服务器的数量,从头开始构建所有内容的方法将发生巨大变化。

Model serving options for models typically fit into a couple different types:

模型的模型服务选项通常可以分为两种类型:

  • Internally built executable (PKL File/Java) — containerized & non-containerized
  • Cloud ML Provider — Amazon SageMaker, Azure ML, Google AI云ML提供商-Amazon SageMaker,Azure ML,Google AI
  • Batch or Stream: Hosted & On-Prem — Algorithmia, Spark/Databricks, Paperspace
  • Open Source — TensorFlow Serving, Kubeflow, Seldon, Anyscale, etc.

Which of these is the right choice for a given team? There are a number of considerations in the decision of model serving options. Here are a few questions teams ask themselves to determine which is the best ML option for them:

对于给定的团队,哪一个是正确的选择? 决定模型服务选项时有很多注意事项。 以下是团队要问自己的几个问题,以确定哪种最适合他们的ML选项:

(Key Questions to Consider)

  1. What are the data security requirements of the organization?
    企业对数据的安全要求是什么?

On-premise ML solutions may be required for organizations with strict data security requirements. Some good choices are Algorithmia, Seldon, Tensorflow, Kubeflow, or home- built proprietary solutions. Some providers such as Algorithmia have security specific feature sets detailed below. Cloud solutions may be a better choice for those organizations needing less security but more remote access/virtualization.

对于具有严格数据安全要求的组织,可能需要本地ML解决方案。 一些好的选择是Algorithmia,Seldon,Tensorflow, Kubeflow或自制的专有解决方案。 一些提供商(例如Algorithmia)具有特定于安全性的功能集,在下面进行详细介绍。 对于需要较少安全性但需要更多远程访问/虚拟化的组织而言,云解决方案可能是更好的选择。

2. Does the team want managed or unmanaged solutions for model serving?

2.团队需要为模型服务提供托管还是非托管解决方案?

A managed solution such as Algorithmia, SageMaker, Google ML, Azure, and Paperspace are a good idea for companies with a low IT presence. An un-managed solution such as Kubeflow, Seldon, Tensorflow Serving, or Anyscale may be better for more technical organizations.

托管解决方案,例如Algorithmia,SageMaker,Google ML,AzurePaperspace ,对于IT程度较低的公司来说是个好主意。 对于更多技术组织而言,诸如Kubeflow, Seldon,Tensorflow ServingAnyscale之类的非托管解决方案可能会更好。

3. Is every team in the organization going to use the same deployment option?

3.组织中的每个团队都将使用相同的部署选项吗?

Even if one team chooses a serving option, rarely is the whole organization using the same serving approach. Having common model management platform like ML-Flow can still help bridge the gap.

即使一个团队选择服务选项,整个组织也很少使用相同的服务方法。 拥有像ML-Flow这样的通用模型管理平台仍然可以帮助缩小差距。

4. What does the final model look like? Is there an already established interface?

4.最终模型是什么样的? 是否已建立接口?

If a model is already deployed, it might not make sense to rip out the model serving system and replace it with a new model server. How easy it would be to replace the already-deployed model might depend on the model server that was chosen and its integrations with other systems, APIs, and Feature Pipelines.

如果已经部署了模型,则淘汰模型服务系统并将其替换为新的模型服务器可能没有意义。 替换已经部署的模型有多容易,可能取决于所选的模型服务器及其与其他系统,API和功能管道的集成。

5. Where does the model executable live? (for example, ML Flow or S3 Bucket)

5.模型可执行文件存放在哪里? (例如ML Flow或S3存储桶)

Easy integration to ML-Flow or model storage systems is an important consideration.

易于集成到ML-Flow或模型存储系统是一个重要的考虑因素。

6. Is GPU inference needed?

6.是否需要GPU推理?

Predictions using GPU servers based on performance requirements will likely drive you to either cloud providers or Algorithmia for on-premises.

根据性能要求使用GPU服务器进行的预测可能会将您带到云提供商或Algorithmia进行本地部署。

7. Are there separate feature generation pipelines or are they integrated into the model server?

7.是否有单独的特征生成管道,或者它们是否集成到模型服务器中?

Depending on where your feature pipelines are deployed, say, Amazon Web Services (AWS), that might direct you toward using SageMaker. This is probably one of more common reasons to use SageMaker as data is already deployed in AWS.

根据功能管道的部署位置(例如,Amazon Web Services(AWS)),可能会指导您使用SageMaker。 这可能是使用SageMaker的更常见原因之一,因为数据已经部署在AWS中。

Deployment Details

部署细节

The format of the model can vary, based on the frameworks used to build the model, across organizations and projects. Some example formats include a pickle image of weights/parameters for the classifier, a Tensorflow SavedModel object, a Pytorch model, Keras model, XGBoost model, Apache TVM, MXNet, ONNX Runtime, etc.

模型的格式可以根据组织和项目之间用于构建模型的框架而有所不同。 一些示例格式包括用于分类器的权重/参数的腌制图像,Tensorflow SavedModel对象,Pytorch模型,Keras模型,XGBoost模型,Apache TVM,MXNet,ONNX Runtime等。

Implementation

实作

There are many ways that ML models can be implemented. These models can be integrated into a larger system’s codebase, deployed as a microservice, or even live on a device. If the model is code that is integrated into a larger system, the interface into the model is simply a function call. If the model is in its own service/executable or server, it can be seen as a service. These services have well defined APIs or interfaces to pass inputs to the models and get responses.The model servers described above take the trained model artifact generated in the above formats and allow you to deploy it to a containerized model server that generates well defined APIs.

机器学习模型的实现方法有很多种。 这些模型可以集成到更大系统的代码库中,可以部署为微服务,甚至可以存在于设备上。 如果模型是集成到较大系统中的代码,则模型的接口仅是函数调用。 如果模型在其自己的服务/可执行文件或服务器中,则可以将其视为服务。 这些服务具有定义明确的API或接口,以将输入传递给模型并获得响应。上述模型服务器采用以上述格式生成的经过训练的模型工件,并允许您将其部署到生成定义良好的API的容器化模型服务器。

Containerization

货柜化

A modern server approach is to containerize the model executable so there is a common interface into models and a common way of standing them up. The model is pulled from the model management system (such as ML-Flow) into a container when it is deployed. There are many ways to accomplish this, either building a custom container for your company, using open source solutions like KubeFlow or Seldon AI, or using the common cloud provider tools such as Algorithmia, SageMaker, Azure or Google AI.

一种现代的服务器方法是将模型可执行文件容器化,以便在模型中有一个通用接口以及一种通用的站立方式。 部署模型后,会将模型从模型管理系统(例如ML-Flow )中拉到容器中。 有很多方法可以完成此任务,或者使用KubeFlow或Seldon AI等开源解决方案为您的公司构建自定义容器,或者使用AlgorithmiaSageMakerAzureGoogle AI等通用云提供商工具。

Real-Time or Batch Model

实时或批处理模型

Another important deployment consideration to make is whether to have a real-time/online model or a batch model. Online models are used when predictions need to be immediate and take in real-time application input. If this isn’t a requirement, batch inferences can be appropriate.

部署时要考虑的另一个重要因素是具有实时/在线模型还是批处理模型。 当需要立即进行预测并接受实时应用程序输入时,将使用在线模型。 如果这不是必需的,则批处理推断可能是合适的。

A number of serving platforms allow you to build a single model and have different deployment options (Batch or Real Time) to support both types of serving regimes.

许多服务平台允许您构建单个模型,并具有不同的部署选项(批处理或实时)来支持两种类型的服务体制。

在模型服务器中要寻找的东西 (Things to Look for in Model Servers)

Easy Scale Out:

轻松扩展

As the prediction volume grows of an application, the initial approach of having a single server supporting predictions can get easily overwhelmed. The ability to simply add servers to a prediction service, without the need to re-architect or generate a large amount of additional model operations work, is one of the more useful features of a model server.

随着应用程序预测量的增长,拥有支持预测的单个服务器的最初方法很容易变得不知所措。 将服务器简单地添加到预测服务而无需重新架构或生成大量其他模型操作工作的能力,是模型服务器更有用的功能之一。

Canary A/B Framework:

金丝雀A / B框架

A canary A/B framework allows developers to roll out software to a small subset of users to perform A/B testing to figure out which aspects of the software are most useful and provide the best functionality for users. Once deployed, some teams run a A/B (canary) model side-by-side with the production model, initially predicting on only a small subset of traffic for the new model. This is done as a simple test before deploying the new model across the full volume of predictions. A lot of teams we talk to have home-built their own A/B testing framework. That said, some of the model server solutions also support easy A/B deployments out of the box, for example, to choose the % of traffic to the B model with the click of a button.

Canary A / B框架允许开发人员向一小部分用户推出软件,以执行A / B测试,从而确定软件的哪些方面最有用,并为用户提供最佳功能。 部署后,一些团队将A / B(金丝雀)模型与生产模型并排运行,最初仅针对新模型的一小部分流量进行预测。 在整个预测范围内部署新模型之前,这是一项简单的测试。 我们交谈过的许多团队都建立了自己的A / B测试框架。 也就是说,某些模型服务器解决方案还支持开箱即用的简单A / B部署,例如,通过单击按钮选择B模型的流量百分比。

Ensemble Support:

合奏支持

The ability to co-locate multiple models in the same server or easily connect the prediction (inference) flow between models might be an important consideration. Most of the time, the model response will be consumed by the end application, but as systems get more complex, some models’ outputs are inputs to another model. In cases of fast prediction response, co-locating models can be desired.

将多个模型放置在同一服务器中或轻松连接模型之间的预测(推理)流的能力可能是重要的考虑因素。 在大多数情况下,模型响应将由最终应用程序消耗,但是随着系统变得越来越复杂,某些模型的输出将成为另一个模型的输入。 在快速预测响应的情况下,可能需要协同定位模型。

Fall Back Support:

后备支持

As you deploy a new model into production, you might find that performance drops drastically. The ability to have a different model, maybe a previous version or a very simple model during periods of degraded performance, can be very helpful in situations like this.

在将新模型部署到生产中时,您可能会发现性能急剧下降。 在这种情况下,具有不同模型的能力(在性能降低期间可能是早期版本或非常简单的模型)可能非常有用。

Security:

安全性

If security is extremely important to the organization, some platforms have very well thought out security feature sets. These span a set of security requirements focused on: access rights, application security, network security, and memory security. The model in production needs to grab data/inputs from somewhere in the system, and it needs to generate predictions/outputs used by other systems. The application who has access to the predictions might not have rights to the input data. Also, if the application is using Python packages in a Kubernetes-hosted model, many companies want to make certain that those packages are not public packages. Lastly, if you are running in a shared memory environment like a GPU, you will need to take stock of what data protections you have in place around memory encryption and access. Some platforms, such as Algorithmia, have more developed security feature sets that provide solutions for a myriad of situations.

如果安全性对组织极为重要,则某些平台会考虑周全的安全功能集。 这些跨越了一组安全要求,这些安全要求集中于:访问权限,应用程序安全,网络安全和内存安全。 生产中的模型需要从系统中的某个位置获取数据/输入,并且需要生成其他系统使用的预测/输出。 有权访问预测的应用程序可能没有输入数据的权限。 另外,如果应用程序在Kubernetes托管的模型中使用Python软件包,则许多公司希望确定这些软件包不是公共软件包。 最后,如果您在GPU等共享内存环境中运行,则需要评估一下围绕内存加密和访问进行的数据保护。 某些平台(例如Algorithmia)具有更加完善的安全功能集,可为多种情况提供解决方案。

Feature Pipeline Support:

功能管道支持

In containerized solutions, input-to-feature transformations may reside in the container itself, or may have separate feature transformation pipelines. The larger the infrastructure, the more likely the input-to-feature transformation is a pipeline or feature store system with inputs to the container being pre-processed.

在容器化解决方案中,输入到特征的转换可能驻留在容器本身中,或者可能具有单独的特征转换管道。 基础结构越大,输入到功能的转换就越有可能是管道或要素存储系统,其中容器的输入已经过预处理。

In the serving layer, there are also some new platforms such as Tecton AI which are focused on feature serving. A global feature store allows teams to easily deploy the feature pipeline directly into production environments — minimizing feature pipeline mistakes, and allowing teams to take advantage of cross-company feature builds.

在服务层中,还有一些新平台,例如Tecton AI,专注于要素服务。 全局功能存储使团队可以轻松地将功能管线直接部署到生产环境中,从而最大程度地减少功能管线错误,并允许团队利用跨公司功能。

Monitoring:

监控

Some model servers support basic monitoring solutions. Some servers support monitoring for serving infrastructure, memory usage, and other operational aspects. Our view is that this type of raw model ops monitoring and visualization is important to model scale out, but is not observability. We are obviously biased, but have an opinion that true model observability is really a separate platform.

一些型号服务器支持基本的监视解决方案。 某些服务器支持对服务基础结构,内存使用情况和其他操作方面的监视。 我们的观点是,这种原始模型的操作监视和可视化对于模型横向扩展很重要,但不是可观察性。 我们显然有偏见,但认为真正的模型可观察性实际上是一个独立的平台。

Example ML Infrastructure Platforms for Deployment and Serving include Datarobot, H2O.ai, Sagemaker, Azure, Google Kubeflow, and Tecton AI.

用于部署和服务的示例ML基础架构平台包括Datarobot,H2O.ai,Sagemaker,Azure,Google KubeflowTecton AI。

(Model Observability vs Model Monitoring)

It may seem like anyone can do monitoring — green lights are good, and red lights are bad. You can set alerts, and if a value falls below a certain level, this triggers sending an email to the staff.

似乎任何人都可以进行监视-绿灯亮,红灯坏。 您可以设置警报,如果值低于某个特定水平,则会触发向工作人员发送电子邮件。

Yet, if that were the case, Amazon Cloud Watch would have killed Datadog.

但是,如果真是这样,Amazon Cloud Watch将会杀死Datadog。

The issue here is — what do you do when you get that alert?

这里的问题是-收到警报时您会怎么做?

Our opinion is that the difference between a monitoring solution and an Observability platform is the ability to troubleshoot and bottom out issues seamlessly. In the ML ecosystem, these problems surface as issues linking AI research to engineering. Is the platform designed from the bottom up to troubleshoot problems, or was an alerting system tacked on to some pre-existing graphs? Troubleshooting models + data in the real world is a large and complex space. That’s why Observability platforms are designed from the ground up to help research and engineering teams jointly tackle these problems.

我们的意见是,监视解决方案和Observability平台之间的区别在于能够无缝地排除故障并排除问题。 在ML生态系统中,这些问题是将AI研究与工程联系起来的问题。 该平台是自下而上设计的,用于对问题进行故障排除,还是在某些现有图形上添加了警报系统? 在现实世界中对模型+数据进行故障排除是一个庞大而复杂的空间。 这就是为什么从根本上设计可观察性平台以帮助研究和工程团队共同解决这些问题的原因。

为什么模型服务器不是可观察性的好地方: (Why the model server is not a great spot for Observability:)

The model server does not have the right data points to link the complex layers needed to analyze models. The model server is missing essential data such as training data, test runs, pre-one hot encoded feature data, truth/label events, and much more. In the case of feature data, for a number of larger models we have worked on, the insertion point into the data pipeline for troubleshooting is a very different technology than the model server. Lastly, many organizations have as many model serving approaches as they have models in production, and it is very unlikely they will move to a single server to rule them all. What happens when you have a mix of models served that feed each other data but you want a cohesive picture?

模型服务器没有正确的数据点来链接分析模型所需的复杂层。 模型服务器缺少必不可少的数据,例如训练数据,测试运行,预先热编码的特征数据,真相/标签事件等。 对于要素数据,对于我们已经处理的许多较大的模型,用于故障排除的数据管道插入点是与模型服务器完全不同的技术。 最后,许多组织拥有与生产中的模型一样多的模型服务方法,并且他们不太可能将移动到单个服务器来统治所有模型。 当您有多个提供彼此数据的模型但又想要一张完整的图片时,会发生什么?

It’s the same in software infrastructure; your infrastructure observability solutions are not tied to the infrastructure itself.

在软件基础架构中是相同的; 您的基础架构可观察性解决方案不依赖于基础架构本身。

(Up Next)

We hope you enjoyed the ML Infrastructure series! Up next, we will be diving deeper into Production AI. There are so many under-discussed and extremely important topics on operationalizing AI that we will be diving into!

我们希望您喜欢ML基础结构系列! 接下来,我们将更深入地研究Production AI。 关于AI的可操作性讨论的话题太多了,我们将深入探讨!

(Contact Us)

To read more of our thoughts on the potential of AI and ML, follow us on Twitter and Medium!

要阅读更多关于AI和ML潜力的想法,

Arize AI is laser-focused on Production ML. If you’d like to hear more about what we’re doing at Arize AI, reach out to us at contacts@arize.com. If you’re interested in joining a fun, rockstar engineering crew to help make models successful in production, reach out to us at jobs@arize.com!