In the snippet above, notice that the encoder (also referred to as the base model) weights are not frozen. This is why a very small learning rate (2e-5) is chosen to avoid loss of pre-trained representations. Learning rate and other hyperparameters are captured under the TrainingArguments object. During the training, we are only capturing accuracy metrics. You can modify the compute_metrics function to capture and report other metrics.
We will explore integration with Cloud AI Platform Hyperparameter Tuning Service in the next post of this series.
Training the model on Cloud AI Platform
While you can do local experimentation on your AI Platform Notebooks instance, for larger datasets or models often a vertically scaled compute resource or horizontally distributed training is required. The most effective way to perform this task is AI Platform Training service. AI Platform Training takes care of creating designated compute resources required for the task, performs the training task, and also ensures deletion of compute resources once the training job is finished.
Before running the training application with AI Platform Training, the training application code with required dependencies must be packaged and uploaded into a Google Cloud Storage bucket that your Google Cloud project can access. There are two ways to package the application and run on AI Platform Training:
- Package application and Python dependencies manually using Python setup tools
- Use custom containers to package dependencies using Docker containers
Using Python packaging to build manually
For this sentiment classification task, we have to package the training code with standard Python dependencies –
tqdm – in the
setup.py file. The
find_packages() function inside
setup.py includes the training code in the package as dependencies.