Google showcases Cloud TPU v4 Pods for large model trainingGoogle showcases Cloud TPU v4 Pods for large model trainingProduct ManagerTechnical Program Manager, TPU

The performance numbers demonstrated by our submission also rely on our XLA linear algebra compiler and leverage the Lingvo framework. XLA transparently performs a number of optimizations, including GSPMD based automatic parallelization of many of the computation graphs that form the building blocks of the ML model. XLA also allows for reduction in latency by overlapping communication with the computations. Our two submissions demonstrate the versatility and performance of our software stack across two frameworks, TensorFlow and JAX.

Large models in MLPerf

Google’s submissions represent an important class of models that have become increasingly important in ML research and production, but are currently not represented in MLPerf’s Closed division benchmark suite. 

We believe that adding these models to the benchmark suite is an important next step and can inspire the ML systems community to focus on addressing the scalability challenges that large models present.

Our submissions demonstrate 63% computational efficiency, cutting edge in the industry. This high computational efficiency enables higher experimentation velocity through faster training. This directly translates into cost savings for Google’s Cloud TPU customers.  

Please visit the Cloud TPU homepage and documentation to learn more about leveraging Cloud TPUs using TensorFlow, PyTorch, and JAX.

1. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See for more information.
2. Computational efficiency and end-to-end training time are not official MLPerf metrics

Leave a Comment