AI Model Training is 2x Faster Than it Was A Year Ago With New Technology

A study of the time it takes to complete a set of commonly used machine learning training benchmarks collectively called “MLPerf” has revealed that these tests can be completed twice as fast as a year ago - a factor that outstrips Moore’s Law. Most of the gain has come about thanks to software and systems innovations, but this year also gave the first peek at what some new processors specifically aimed at machine learning workloads, from Graphcore and Intel subsidiary Habana Labs, can do.

The once-crippling time it took to train a neural network to do its task is the problem that drove companies like Google to develop machine-learning accelerator chips in house. But the new MLPerf data shows that training time for standard neural networks has become a lot less taxing in a short period of time.

This capability has incentivized machine-learning experts to dream big. So the size of new neural networks continues to outpace computing power. Called by some “the Olympics of machine learning,” MLPerf consists of eight benchmark tests: image recognition, medical-imaging segmentation, two versions of object detection, speech recognition, natural-language processing, recommendation, and a form of gameplay called reinforcement learning.

Computers and software from 21 companies and institutions compete on any or all of the tests. This time around, officially called MLPerf Training 2.0, they collectively submitted 250 results. As usual, systems built using Nvidia’s Ampere A100 GPUs dominated the results. Nvidia’s new GPU architecture H100 ‘Hopper,’ was designed with architectural features aimed at speeding training but its release date came too close to these tests to be included.