Dimentionality Reduction

We all understand that more data means better AI. That sounds great! But, with the recent blast of information, we often end in a problem of too much data! We need all that data. But it turns out to be too much for our processing. Hence we need to look into ways of streamlining the available data so that it can be compressed without losing value. Dimensionality reduction is an important technique that achieves this end.

Consider the simple case of predicting medical expenses based on several parameters. The data may include different parameters related to all the negative habits of a person - excessive intake of tobacco, alcohol, caffine, narcotics, etc. It may be great to accumulate each of these parameters independently. But does that really add value? are these independent parameters? One can guess there is a lot of correlation between each of them. For example, someone who is high on alcohol or someone who consumes narcotics, is quite likely to be liberal about tobacco and caffine.

One can easily see that we really do not need all these parameters. But, is there a way to choose only two or three of these? That is not so intuitive either. Dimensionality reduction can help us extract a couple of independent parameters from these. That can simplify our prediction model.

This was a trivial example. But, in a real life scenario, it is common to have hundreds or thousands of features in the input. Dimensionality reduction is a major aid when working on such a problem.

There are two approaches to dimensionality reduction. These are the two different paradigms for addressing the problem. There is no good or bad way - we need to choose one of them based on the problem at hand.

  • Feature Selection - If the impact of a particular feature is almost redundant, we can just drop it, selecting the significant features that independently impact the outcome.
  • Feature Combination - This is a bit more complex. When we have a N different features that carry information worth M features, we need to create a way to map the N features into a new set of M features. This is true dimensionality reduction.