New MIT Neural Network Architecture May Reduce Carbon Footprint by AI

by April 29, 2020


Artificial Intelligence may seem transient, yet it always managed to have a controversial presence. Recently it raised concerns about its sustainability. In June 2019, the University of Massachusetts at Amherst study discovered that a single large (213 million parameters) Transformer-based neural network built using NAS (commonly used in machine translation) has produced around 626,000 pounds of carbon dioxide. This amount is equivalent to five times more than an average car produces in its lifespan. These massive consumption numbers are because of the energy needed to run specialized hardware like GPUs and TPUs for AI training and development.

According to another study, Google’s AlphaGo Zero generated colossal 96 tonnes of CO2 over 40 days of research training. This amounts to 1000 hours of air travel or a carbon footprint of 23 American homes.

Now fast forward to the present, a team at Massachusetts Institute of Technology (MIT) have found a solution to this. They devised a new automated AI system for training and running certain neural networks. They developed an optimization strategy for DNNs (Deep Neural Network) that prepares them for deployment on diverse hardware platforms and edge devices. This promises to reduce the carbon footprint of inference as well as training. Based on the results, researchers believe that by improving the computational efficiency methods, the system can lower the carbon emission-which in some cases may bring down to low triple digits as opposed to six digits figures.

“The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science at MIT.

The team endorses a once-for-all or OFA approach to reduce the number of GPU hours required to train the models without compromising on accuracy levels. OFA guided one large neural network that involved many pre-trained sub-networks of different sizes that did not need further retraining. It relies on the progressive thinking algorithm. It starts with the training of a maximum-sized full network then subsequently shrinks the network size to include smaller sub-networks by order of magnitude principle. This disengages model training and architecture search and covers the one-time training cost amongst hardware platforms and resource constraints.

The fewer hours of GPU led to a significant reduction of the electrical energy consumption than what is typically needed to train each specialized neural network for new platforms. If used to train a computer-vision model, the team estimates that the process will require a minimal quantity of nearly 1/1,300 carbon emissions in comparison to existing state-of-the-art neural architecture. This can minimize inference time by a factor of 1.5-2.6 times. This AI-powered OFA system also has the potential to transform both enterprise and consumer devices and fared 2.6 in internal test performance than some NAS-created models.

The researchers used IBM‘s new supercomputer Satori. This donated supercomputer cluster, which is considered as one of the world’s greenest supercomputers with the capacity to perform approx. 2 quadrillion calculations per second.

“Satori is equipped to give energy/carbon feedback to users, which makes it an excellent ‘laboratory’ for improving the carbon footprint both AI hardware and software,” says John Cohn, an IBM fellow and member of the MIT-IBM Watson AI Lab. He further weighs in saying, “If rapid progress in AI is to continue, we need to reduce its environmental impact. The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better.”