Feature Selection
What is Feature selection
When building a machine learning model in real-life, it’s almost rare that all the variables in the dataset are useful to build a model. Adding redundant variables reduces the generalization capability of the model and may also reduce the overall accuracy of a classifier. Furthermore adding more and more variables to a model increases the overall complexity of the model. The goal of feature selection in machine learning is to find the best subset of features from the available input features which are most relevant for the usecase/forecast problem.
One way to classify various feature selection methods is in terms of existence of the target variable as follows:
-
Unsupervised: Do not use the target variable (e.g. remove redundant variables).
-
Supervised: Use the target variable (e.g. remove irrelevant variables).
- Wrapper: Search for well-performing subsets of features. For. Ex.: RFE
- Filter: Select subsets of features based on their relationship with the target. i) Statistical Methods based on statistical relationship between variables. For. ex. correlation, annova, chi-squared, mutual-information etc ii) Feature Importance Methods
- Intrinsic: Algorithms that perform automatic feature selection during training. For Example. Tree based algorithms allow i)Decision Trees
Another way to think of feature selection methods is into target elimination and combining features from a higher dimensionality to lower dimensionality matrix. These methods are called Dimensionality Reduction Methods.
Often Dimensionality Reduction Methods and Feature Selection Methods discussed earlier serve the same purpose. However one big difference between the traditional feature selection methods and dimensionality reduction methods is to traditional feature selection methods donot change the features themselves. It works by simply selecting or excluding a set of features for the selected problem, where as dimensonality reduction matrix transform the input features into a newer set of feature variables in order to acheive lower number of features.
Common techniques used for feature selection/dimensionality reduction
- Remove features with missing values
- Remove features with low variance (Variance threshold)
- Remove highly correlated features
- Recursive Feature Elimination
- Feature selection using feature weights from a trained model (SelectFromModel): Using this requires you to provide an estimator which is than trained with all the available features and target. After the training, we get status of the selected features and the data
- Feature Selection using k best features based on a chosen metric (SelectKBest/SelectPercentile): Metric could be f_classif, mutual information, chi2, f_regression, false positive rate(FPR)
- Dimensionality Reduction(PCA) - Useful in many scenarios but particularly used where there is excessive multi-collinearity and explainability is not important.
Benefits of Feature Selection
- Reduces overfitting of data: Keeping irrelavant features can result in overfitting. Lessor number of features leads to lesser redundancy and therefore lessor chances of overfitting. Overfitting can reduce the predictive capability of the model itself.
- Improves accuracy of the model: with lesser chance of misleading data, the accuracy of the model is increased.
- Training time is reduced: removal of irrelevant features reduces the algorithm complexity as only fewer data points are present. Therefore, the algorithms train faster.
- Improve Interpretability of the model: Reduced complexity results in better interpretability of the model.