DeepMind’s Perceiver IO is Now an Open-Source Deep Learning Model

DeepMind’s Perceiver IO is Now an Open-Source Deep Learning Model

Read about the new development in deep learning technology.

To leverage developments in deep learning, DeepMind has open-sourced Perceiver IO. It's a general-purpose deep learning model architecture for various types of inputs and outputs. As described on DeepMind's blog, Perceiver IO can serve as a replacement for transformers, using attention to map inputs into a latent representation space. Eliminating the drawbacks of a transformer, Perceiver IO facilitates longer input sequences without incurring quadratic compute and memory loss. This deep learning model can work with different types of data like text, images, and sound, and can decode the output into any data type.

A team of researchers at DeepMind ran a series of experiments to evaluate the model's performance on various tasks. For tasks like language understanding, image classification, optical flow, and video-games playing, the model performed comparable to others of its king and achieved a high result on the Sintel optical flow benchmark.

Computer vision models traditionally use convolutional neural networks, natural language processing models are powered by sequence-learning architecture like the transformer, and most deep learning models are inspired by architectures designed for a fixed type of data. Systems that can handle multi-modal data, like Google's combined vision-language model, are designed to process different types of data input that follow domain-specific architectures which are then combined using the third module.

Many researchers use the Transformer architecture in CV problems that begin with applying a CNN-based pre-processing step. The transformer also requires computing and memory resources that increase with the square of input sequence length. These demands make this approach impractical for high-dimensional data types.

Perceiver IO can solve this complication by uses cross-attention to project high-dimensional input arrays into a lower-dimensional latent space. This latent space is then processed using a standard Transformer self-attention structure. As the latent space has a smaller dimension compared to the input, the processing can be in-depth than a transformer that directly processes large outputs. Lastly, the latent representation is converted to output by applying a query array with the same number of elements that are required in the output.

The research team at DeepMind analyzed Perceiver IO on a myriad of tasks in different domains. For natural language processing, the team used the GLUE benchmark which, compared to the BERT model, performed slightly better. For computer vision, the research team used the ImageNet benchmark. The team tested Perceiver IO with numerous variations of positional feature inputs, with some containing 2D spatial information. Using that, the model performed comparably to the ResNet-50 baseline. When the deep learning model only used learned positional features with no knowledge of the 2D image structure, the model performed slightly worse. It got 72.7% accuracy while ResNet-50 got 78.6%. Despite that, the authors believe that this is the best result by any model on ImageNet without 2D architectural feature information. To further test the model's capabilities, the research team used Perceiver IO as a drop-in replacement for the Transformer module in an AlphaStar video-game playing artificial intelligence. In this test, the model achieved the same level of performance as the original.

To demonstrate Perceiver IO's abilities, the researchers said, "We hope our latest paper and the code available on Github help researchers and practitioners tackle problems without needing to invest the time and effort to build custom solutions using specialized systems. As we continue to learn from exploring new kinds of data, we look forward to further improving upon this general-purpose architecture and making it faster and easier to solve problems throughout science and machine learning."

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net