This article is the next step in the series of PyTorch on Google Cloud using Vertex AI. In the preceding article, we fine-tuned a Hugging Face Transformers model for a sentiment classification task using PyTorch on Vertex Training service. In this post, we show how to deploy a PyTorch model on the Vertex Prediction service for serving predictions from trained model artifacts.
Now let’s walk through the deployment of a Pytorch model using TorchServe as a custom container by deploying the model artifacts to a Vertex Endpoint. You can find the accompanying code for this blog post on the GitHub repository and the Jupyter Notebook.
Deploying a PyTorch Model on Vertex Prediction Service
Vertex Prediction service is Google Cloud’s managed model serving platform. As a managed service, the platform handles infrastructure setup, maintenance, and management. Vertex Prediction supports both CPU and GPU inferencing and offers a selection of n1-standard machine shapes in Compute Engine, letting you customize the scale unit to fit your requirements. Vertex Prediction service is the most effective way to deploy your models to serve predictions for the following reasons:
- Simple: Vertex Prediction service simplifies model service with pre-built containers for prediction that requires you to only specify where you store your model artifacts.
- Flexible: With custom containers, Vertex Prediction offers flexibility by lowering the abstraction level so that you can choose whichever ML framework, model server, preprocessing, and post-processing that you need.
- Assistive: Built-in tooling to track performance of models and explain or understand predictions.
TorchServe is the recommended framework to deploy PyTorch models in production. TorchServe’s CLI makes it easy to deploy a PyTorch model locally or can be packaged as a container that can be scaled out by the Vertex Prediction service. The custom container capability of Vertex Prediction provides a flexible way to define the environment where the TorchServe model server is run.
In this blog post, we deploy a container running a TorchServe model server on the Vertex Prediction service to serve predictions from a fine-tuned transformer model from Hugging Face for the sentiment classification task. You can then send input requests with text to a Vertex Endpoint to classify sentiment as positive or negative.