Feature Engineering
Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work.
Features are the ways you represent about the world for the classifier. feature selection has a multiplicative effect on the overall modeling process
Feature are numeric or categorical. feature engineering techniques are used to define feature more accuracy for your model.
- Bucketing
- Crossing
- Hashing
- Embedding
Feature Bucketing - transform the numeric feature into categorical feature.
problem - age increase so as income has a linear relation?
Age is not in linear relationship with age as children under 17 year didnt earn much so as after retirement.
Solution - Bucket the age(numeric feature) into age groups (categorical features) and put different weight for each age group. this is how we create age bucket.
Feature Crossing - way to create a new feature that are combination of existing features.
problem - Can linear classifier model interaction between multiple features say age and education against income?
No. This is were feature crossing is useful. for each cross(age bucket,education)-> we create new true/false feature and age bucket is divided into true/false of income with education.
Feature Hashing or hash buckets
one way to represent category feature with large vocabulary.
This representation can say memory and faster to execute.
A categorical feature with a large number of values can be represented and vocabulary not specified in advance.
To avoid collision put the hashing bucket number more than the unique occupation.
It can also be used to limit the number of possibilities.
Embedding - it represent the meaning of the words as a vector.
Used for large vocabulary
Embeddings are dense.
Dimensionality Reduction
Dimensionality Reduction can be divided into two subcategories
- Feature Selection which includes Wrappers, Filters, and Embedded.
- Feature Extraction which includes Principle Components Analysis.
Now consider if c was equal to 0 or an arbitrarily small number, it wouldn't really be relevant,
therefore it could be taken out of the equation. here you are using Feature Selection because you'd be selecting only the relevant variables and leaving out the irrelevant ones.
If you can equate ab = a + b, making a representation of two variables into one, you're using Feature
Extraction to reduce the number of variables.
Feature Selection is the process of selecting a subset of relevant features or variables.
There are 3 main subset types:
- Wrappers,
- Filters, and
- Embedded.
Wrappers use a predictive model that scores feature subsets based on the error-rate of
the model.While they're computationally intensive, they usually produce the best selection of features.
A popular technique is called stepwise regression. It's an algorithm that adds the best feature, or deletes the worst feature at each iteration.
Filters use a proxy measure which is less computationally intensive but slightly less accurate. Filters do capture the practicality of the dataset but, in comparison to error measurement, the feature set that's selected will be more general than if a Wrapper was used.
An interesting fact about filters is that they produce a feature set that don't contain assumptions based on the predictive model, making it a useful tool for exposing relationships between features, such as which variables are 'Bad' together and, as a result, drop the accuracy or 'Good' together and therefore raise the accuracy.
Embedded algorithms learn about which features best contribute to an accurate model during
the model building process. The most common type of is called a regularization model.
The main linear technique is called Principle Components Analysis.
Principle Components Analysis is the reduction of higher vector spaces to lower orders through projection.
An easy representation of this would be the projection from a 3-dimensional plane to a
2-dimensional one.
A plane is first found which captures most (if not all) of the information. Then the data is projected onto new axes and a reduction in dimensions occur. When the projection of components happens, new axes are created to describe the relationship. This is called the principle axes, and the new data is called principle components.
Thanks for reading!!!