The History, Evolution and Growth of Deep Learning

Over the years, deep learning has evolved causing a massive disruption into industries and business domains. Deep learning is a branch of machine learning that deploys algorithms for data processing and imitates the thinking process and even develops abstractions. Deep learning uses layers of algorithms for data processing, understands human speech and recognizes objects visually. In deep learning, Information is passed through each layer, and the output of the previous layer acts as the input for the next layer. The first layer in a network is referred as the input layer, while the last is the output layer the middle layers are referred to as hidden layers where each layer is a simple, uniform algorithm consisting of one kind of activation function.

Another aspect of deep learning is feature extraction which uses an algorithm to automatically construct meaningful features of the data for learning, training and understanding.

History of Deep Learning Over the Years

The history of deep learning dates back to 1943 when Warren McCulloch and Walter Pitts created a computer model based on the neural networks of the human brain. Warren McCulloch and Walter Pitts used a combination of mathematics and algorithms they called threshold logic to mimic the thought process. Since then, deep learning has evolved steadily, over the years with two significant breaks in its development. The development of the basics of a continuous Back Propagation Model is credited to Henry J. Kelley in 1960. Stuart Dreyfus came up with a simpler version based only on the chain rule in 1962. The concept of back propagation existed in the early 1960s but only became useful until 1985.

Developing Deep Learning Algorithms

The earliest efforts in developing deep learning algorithms date to 1965, when Alexey Grigoryevich Ivakhnenko and Valentin Grigorʹevich Lapa used models with polynomial (complicated equations) activation functions, which were subsequently analysed statistically.

During the 1970's a brief setback was felt into the development of AI, lack of funding limited both deep learning and artificial intelligence research. However, individuals carried on the research without funding through those difficult years.

Convolutional neural networks were first used by Kunihiko Fukushima who designed the neural networks with multiple pooling and convolutional layers. Kunihiko Fukushima developed an artificial neural network, called Neocognitron in 1979, which used a multi-layered and hierarchical design. The multi-layered and hierarchical design allowed the computer to learn to recognize visual patterns. The networks resembled modern versions and were trained with a reinforcement strategy of recurring activation in multiple layers, gaining strength over time.

The FORTRAN code for Back Propagation

In 1970's, back propagation, was developed which uses errors into training deep learning models. Back propagation became popular when Seppo Linnainmaa wrote his master's thesis, including a FORTRAN code for back propagation. Though developed in the 1970's, the concept was not applied to neural networks until 1985 when Hinton and Rumelhart, Williams demonstrated back propagation in a neural network which could provide interesting distribution representations. Yann LeCun explained the first practical demonstration of backpropagation at Bell Labs in 1989 by combining convolutional neural networks with back propagation to read handwritten digits. The combination of convolutional neural networks with back propagation system was used to read the numbers of handwritten checks.

1985-90s kicked the second lull into artificial intelligence which effected research for neural networks and deep learning. Going on over the years, in 1995 Vladimir Vapnik and Dana Cortes developed the support vector machine which is a system for mapping and recognizing similar data. Long short-term memory or LSTM was developed in 1997 by Juergen Schmidhuber and Sepp Hochreiter for recurrent neural networks.

The next significant deep learning advancement was in 1999 when computers adopted the speed of the GPU processing. Faster processing meant increased computational speeds of 1000 times over a 10-year span. This era meant neural networks began competing with support vector machines. Neural networks offered better results using the same data, though slow to a support vector machine.

Deep Learning from the 2000s and Beyond

The Vanishing Gradient Problem came out in the year 2000 when "features" (lessons) formed in lower layers were not being learned by the upper layers since no learning signal reached these layers were discovered. This was not a fundamental problem for all neural networks but is restricted to only gradient-based learning methods. This problem turned out to be certain activation functions which condensed their input and reduced the output range in a chaotic fashion. This led to large areas of input mapped over an extremely small range.

In 2001, a research report compiled by the META Group (now called Gartner) came up with the challenges and opportunities of the three-dimensional data growth. This report marked the onslaught of Big Data and described the increasing volume and speed of data as increasing the range of data sources and types.

Fei-Fei Li, an AI professor at Stanford launched ImageNet in 2009 assembling a free database of more than 14 million labeled images. These images were the inputs to train neural nets. The speed of GPUs had increased significantly by 2011, making it possible to train convolutional neural networks without the need of layer by layer pre-training. Deep learning holds significant advantages into efficiency and speed.

The Cat Experiment

In 2012, Google Brain released the results of an unusual free-spirited project called the Cat Experiment which explored the difficulties of unsupervised learning. Deep learning deploys supervised learning, which means the convolutional neural net is trained using labeled data like the images from ImageNet. This experiment used a neural net which was spread over 1,000 computers where ten million unlabelled images were taken randomly from YouTube, as inputs to the training software. From that year onwards, unsupervised learning remains a significant goal in the field of deep learning.

2018 and years beyond will mark the evolution of artificial intelligence which will be dependent on deep learning. Deep learning is still in the growth phase and in constant need of creative ideas to evolve further.

Deep Learning