Logistic Regression is a very important concept in Supervised Machine Learning - because it helps you use the powerful techniques of Regression based learning to the Classification problems. Typically Regression algorithms deals with data where input as well as output are continuous. Logistic Regression extends the same algorithms to binary classification.

This is done by mapping the output of the matrix product into a binary output. This is done using an activation function. An activation function is a simple function that compares the input value with some threshold and generates a near binary, differentiable output. Not just linear regression, the activation function can be used to map any regression model over to a logistic model.

There are several different types of activation functions - Sigmoid, Tanh and Relu are the most popular ones. Essentially, these are continuous functions that generate "near binary" output. That is, the output is similar for input values much less than 0 and also for values much more than 0. And there is a strong gradient around 0. This approximates very well to a binary classifier - although mathematically, the function is continuous. Thus, we can use the powerful techniques of Regression.

The activation function forms a major component of logistic regression. There are different types of activation functions for different kinds of needs. Sigmoid, Tanh, Relu are some.

The sigmoid function is very close to 0 for large negative numbers, and very close to 1 for large positive numbers. The gradient is steep around 0. That makes sigmoid a very good activation function.

`sigmoid(x) = 1 / (1 + e**(-x))`

The value of e**(-x)) is very high for negative values of x and very low for positive values of x. Hence 1 / (1 + e**(-x)) is almost 0 for negative numbers and almost 1 for positive numbers. The gradient is very high around 0. At 0, its value is 1/2.

Thus, the sigmoid function can be used for classification - making it a good candidate for activation.

In principle, Tanh is quite similar to the Sigmoid function. But its value ranges between -1 and 1.

`tanh(x) = (e**x - e**(-x)) / (e**x + e**(-x))`

For negative values of x, e**x is close to 0, so the value is close to -e**(-x) / e**(-x). That is -1. And for positive values of x, e**(-x) is close to 0. So the value is close to e**(x) / e**(x). That is 1. The gradient is steep around 0.

This is a bit different from Sigmoid and Tanh. Arithmetically it is a lot simpler than them.

`relu(x) = x > 0 ? x : 0 # Value is x if x > 0 and 0 if x <= 0.`

It's application is not so simple for final classification. But it is very good for intermediate phases. We will check that when we look at the implementations.