System Reduces the Energy Needed for Training, Running Neural Networks

Although artificial intelligence has turned out to be a spotlight of some moral concerns, it also has some significant sustainability problems.

MIT researchers have developed a new automated AI system with improved computational efficiency and a much smaller carbon footprint. The researchers’ system trains one large neural network comprising many pretrained subnetworks of different sizes that can be tailored to diverse hardware platforms without retraining. Image Credit: MIT News, based on figures courtesy of the researchers.

In June last year, scientists at the University of Massachusetts at Amherst came up with a shocking report predicting that the quantity of power needed for training and finding a specific neural network architecture leads to the emissions of approximately 626,000 pounds of carbon dioxide.

This huge amount of carbon dioxide is equivalent to nearly five times the lifetime emissions of the average U.S. car, which includes its manufacturing.

This problem tends to get even more serious in the model deployment phase, where the deep neural networks must be deployed on different hardware platforms, each with different computational resources and properties.

The researchers at Massachusetts Institute of Technology (MIT) have designed a latest automated AI system for training and running specific neural networks. The outcomes of the study show that, by enhancing the computational efficiency in some crucial ways, the system can reduce the pounds of carbon emissions involved—in certain cases, down to low triple digits.

The system developed by the MIT researchers, termed once-for-all network, trains one huge neural network that includes several pretrained subnetworks of various sizes that can be customized to different hardware platforms without the need for retraining.

This drastically decreases the energy generally needed to train every specialized neural network for new platforms, which could involve billions of internet of things (IoT) devices. The researchers used the system to train a computer-vision model and predicted that the process needed about 1/1,300th the carbon emissions than the current advanced neural architecture search methods, while decreasing the inference time by 1.5 to 2.6 times.

The aim is smaller, greener neural networks. Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.

Song Han, Assistant Professor, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

The research was performed on Satori, an efficient computing cluster provided by IBM to MIT that can carry out 2 quadrillion calculations for every second. The study will be presented next week at the International Conference on Learning Representations. Four graduate and undergraduate students from MIT-IBM Watson AI Lab, Shanghai Jiao Tong University, and EECS are joining Han in the study.

Creating a “once-for-all” network

The system was developed based on a recent AI advance known as Auto ML (for automatic machine learning), which avoids the need for manual network design. Neural networks perform automatic search of massive design spaces for network architectures customized, for example, to particular hardware platforms.

However, there is still a training efficiency problem: It is essential to select every model and then train it from scratch for its platform architecture.

How do we train all those networks efficiently for such a broad spectrum of devices — from a $10 IoT device to a $600 smartphone? Given the diversity of IoT devices, the computation cost of neural architecture search will explode.

Song Han, Assistant Professor, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

An AutoML system invented by the team trains only a single, large “once-for-all” (OFA) network that acts as a “mother” network, nesting huge numbers of subnetworks that are weakly activated from the mother network.

The OFA shares all its learned weights with all subnetworks—in other words, they come pretrained typically. Therefore, every subnetwork can function independently at inference time without the need for retraining.

The group trained an OFA convolutional neural network (CNN)—generally used for image-processing tasks—with all-around architectural configurations, such as different filter sizes, various numbers of layers and “neurons,” and different input image resolutions.

For a particular platform, the system employs the OFA as the search space to identify the best subnetwork based on the latency and precision trade-offs that correlate to the speed and power limits of the platform.

In the case of an IoT device, for example, the system will identify a smaller subnetwork. In the case of smartphones, it will choose larger subnetworks, but with diverse structures based on individual battery lifetimes and computation resources.

OFA dissociates architecture search from model training, and distributes the one-time training charges across several inference hardware platforms and resource constraints.

This depends on a “progressive shrinking” algorithm that can efficiently train the OFA network to support all of the subnetworks at the same time. It begins with training the complete network with the maximum size, and then gradually shrinks the sizes of the network to involve smaller subnetworks.

Using large subnetworks, smaller subnetworks are trained to grow together. Eventually, all the subnetworks with distinct sizes are supported, thereby enabling quick specialization based on the speed and power limits of the platform. It supports several hardware devices without any training costs while adding a new device.

The team identified that one OFA can include over 10 quintillion—1 followed by 19 zeroes —architectural settings in total, encompassing possibly all platforms ever required. However, training the OFA and searching it becomes much more efficient compared to spending hours training each neural network for every platform.

Furthermore, OFA does not compromise inference efficiency or precision. Rather, it offers sophisticated Image Net accuracy on mobile devices. According to the researchers, when compared to advanced industry-leading CNN models, OFA provides 1.5 to 2.6 times speedup, with excellent accuracy.

That’s a breakthrough technology. If we want to run powerful AI on consumer devices, we have to figure out how to shrink AI down to size.

Song Han, Assistant Professor, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

According to Chuang Gan, “The model is really compact. I am very excited to see OFA can keep pushing the boundary of efficient deep learning on edge devices.” Gan is also the study co-author and a researcher at the MIT-IBM Watson AI Lab.

If rapid progress in AI is to continue, we need to reduce its environmental impact. The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better,” says John Cohn, an IBM fellow and member of the MIT-IBM Watson AI Lab.

Source: http://www.mit.edu/

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Submit