From the mathematical point of view, there is a limit to what a linear function can do and it was noticed that results of polynomials were not good enough to justify the computational expense. Hence the concept of neurons picked up momentum. A lot of Machine Learning is inspired by how the human mind works. The concept of Neural Networks goes one step further - to take inspiration from the way Neurons are laid out in the human brain.
As you can see in the image above, the neurons get multiple inputs and based on these, give out one output that feeds into another set of neurons, and so on. The nervous system is built of many such neurons connected to each other. Each neuron contributes to the decision process by appropriately forwarding the input signal - based on the training it has gathered. Such a model has the potential to hold all that a human brain does. Each neuron has a minimal functionality that can potentially do wonders.
Neurons are implemented as linear function with a non linear topping - called the activation function. Thus, each neuron is defined by weights for each input and a bias. The result of this operation is fed into the activation function. The final output is the input for the next set of neurons. Such Artificial neuron is called a Perceptron.
Often the network has multiple layers of such Perceptrons. That is called MLP (Multy Layer Perceptron). In an MLP, we have an input layer, an output layer and zero or more hidden layers.
Each Perceptron has an array of inputs and an array of weights that are multiplied with the inputs to generate a scalar. This processing is linear - they cannot help fitting a non linear curve - irrespective of the depth of the network. If the network has to fit non linear curves, we need some non linear element on each perceptron. Hence, perceptrons are tipped with a non linear activation function. This could be a sigmoid or tanh or relu ... Researchers have offered several activation functions that have specific advantages.
With everything in place, a neural network looks like this:
The layout, width and depth of the network is one of the most interesting topics of research. Experts have developed different kinds of networks for different kinds of problems. The deeper and larger the network, more is its capacity. Human brain has around 100 billion neurons. Neural Networks are nowhere near that - some researchers quote experiments with million. This concept of large neural networks or Deep Learning is not new. But it was limited to mathematical curiosity and research papers. The recent boom in the availability of massive training data and computing power has made it a big success.
Building, training and tuning Neural Networks is a massive domain and deserves many blogs dedicated to each topic.
Neural Networks offered a major breakthrough in building non linear models. But that was not enough. Any amount of training and data is not enough if our model is not rich enough. After all, the amount of information contained in the model is limited by the number of weights therein. A simple network with a hundred weights cannot define a model for complicated tasks like face recognition. We need a lot more.
It may seem simple, just increasing the number of perceptrons in a network can increase the count of weights. What is the big deal about it? But that is not so simple. As the network grows larger, many other problems start peeping in. In general, the ability of the network does not grow linearly with the number of perceptrons. In fact, it can decrease beyond a point - unless we take care of some important aspects.
The capacity of the networks is a lot better when the network is deeper than wider - that is, if the network has a lot more layers rather than having too many perceptrons in the same layer. Such deep neural networks have enabled miraculous innovations in the past few years. Deep Learning is a branch of machine learning that looks into these aspects of Neural Networks. Deep Learning is the branch of Machine Learning that deals with deep Neural Networks.