Today, we’re incredibly excited to announce the general availability of Vertex Pipelines.
One of the best ways to scale your machine learning (ML) workflows is to run them as a pipeline, where each pipeline step is a distinct piece of your ML process. Pipelines are a great tool for productionizing, sharing, and reliably reproducing ML workflows across your organization. They are also the key to MLOps – with Pipelines you can build systems to automatically retrain and deploy models. In this post, we’ll show what you can do with Vertex Pipelines, and we’ll end by sharing a sample pipeline to help you get started.
Vertex Pipelines in a nutshell
Let’s briefly discuss what an ML pipeline does. A machine learning pipeline is an ML workflow encapsulated as a series of steps, also called components. Each step in a pipeline is a container, and the output of each step can be an input to the next step. This presents two challenges:
For this to work, you need a way to convert individual pipeline steps to containers
This will require setting up infrastructure to run your pipeline at scale
To address the first challenge, there are some great open source libraries that handle converting pipeline steps to containers and managing the flow of input and output artifacts throughout your pipeline, allowing you to focus on building out the functionality of each pipeline step. Vertex Pipelines supports two popular open source libraries – Kubeflow Pipelines (KFP) and TensorFlow Extended (TFX). This means you can define your pipeline using one of these libraries and run it on Vertex Pipelines.
Second, Vertex Pipelines is entirely serverless. When you upload and run your KFP or TFX pipelines, Vertex AI will handle provisioning and scaling the infrastructure to run your pipeline. This means you’ll only pay for the resources used while your pipeline runs; and your data scientists get to focus on ML without worrying about infrastructure.
Vertex Pipelines integrates with other tools in Vertex AI and Google Cloud: you can import data from BigQuery, train models with Vertex AI, store pipeline artifacts in Cloud Storage, get model evaluation metrics, and deploy models to Vertex AI endpoints, all within your Vertex Pipeline steps.
To make this easy, we’ve created a library of pre-built components for Vertex Pipelines. These pre-built components help simplify the process of using other parts of Vertex AI in your pipeline steps, like creating a dataset or training an AutoML model. To use them, import the pre-built component library and use components from the library directly in your pipeline definition.
As an example, below is a pipeline that creates a Vertex AI dataset pointing to data in BigQuery, trains an AutoML model, and deploys the trained model to an endpoint if its accuracy is above a certain threshold: