Support Vector Machine


Support Vector Machine (SVM) is a classification technique. It tries to geometrically divide the data available. If the input data has N features, the data is plotted as points in an N dimensional space. Then, it identifies an N-1 dimensional structure that could separate the groups. The good separation is one which has maximum distance from the two groups. The distance from the group could be identified in various different forms - the distance from the closest points or the distance from the center, or mean of all distances, etc. That would depend upon the kind of data.

But things may not always be so simple. If the features are not separated well enough, we may not have a good plane passing through them. In that case, the scores would be very bad in spite of any tuning. In such a case, we have to rework the features to make sure the data gets segregated properly. We may have to consider a new distance metric that can separate the positive and negative clusters well enough. Or if nothing works, we might have to look for another algorithm.

Hence it is important to have a good idea about the data we have. That can help us choose the correct algorithm and save our time and computation resources.

In effect, this complements the Nearest Neighbor algorithm. Problems that are easier with Nearest Neighbor are difficult with the SVM and vice-versa.