Feature Importance#
Objectives: what you will take away#
Definitions & Understanding Difference between global vs local, and robust vs non-robust Feature Contributions and Feature MDA.
How-To obtain both feature importance metrics.
Prerequisites: before you begin#
You have successfully installed Howso Engine
Notebook Recipe#
The following recipe will supplement the content this guide will cover:
Concepts & Terminology#
The main piece of terminology this guide introduces is the concept of Feature Importance. To understand this, we recommend being familiar with the following concepts:
Local vs Global#
Feature Contributions and Feature MDA can be calculated and retrieved both locally and globally. Conceptually, global metrics are measured using all of the cases in the Trainee, while local metrics use either a specific subset of those cases or a set of new cases and calculates the metrics using cases most similar to those specified cases.
Local
Trainee.react()
along with thedetails
parameter can be used for local metrics.Global
Trainee.react_into_trainee()
along withTrainee.get_prediction_stats()
are used for global metrics.
Robust vs Non-Robust#
In order to calculate feature importance, Howso Engine measures the impact on the prediction by comparing predictions with and without the feature. The feature set without the feature of interest may include either all of the other features, or a combination of features that may include any number of other features. Non-robust calculations use a leave-one-out approach to calculate these metrics, thus the metrics reflect the results when all features expect the feature of interest is used. Robust feature contributions compares the results from sampling from the power set of all combinations with and without the feature of interest. Robust metrics are recommended as they encompass a greater variety of feature sets, and they include a calculation performance boost as the number of features increases.
How-To Guide#
Global Feature Importance#
To get global feature importance metrics, Trainee.react_into_trainee()
, is first called on a trained and analyzed Trainee. Trainee.react_into_trainee()
calls react internally on the cases already trained into the Trainee and calculates the metrics. In this method, the desired metrics can be selected as parameters. These parameters are named individually
and setting them to True
will cache the desired metrics. For example, mda_robust
and contributions_robust
will calculate the robust versions of MDA and Feature Contributions, while mda
and contributions
will calculate the non-robust versions.
t.react_into_trainee(
context_features=context_features,
action_feature=action_features[0],
mda_robust=True,
contributions_robust=True
)
In order to extract the metrics, Trainee.get_prediction_stats()
is called. An action feature must be specified, and the stats
parameter is used determine which metrics to return. The stats
parameter takes a list, so multiple
metrics may be specified together, but for this example they are separated. If robust metrics are calculated, then the robust
parameter must be set to True
to retrieve these metrics. If non-robust metrics are calculated, then the robust
parameter can be set to the default value.
robust_feature_contributions = t.get_prediction_stats(action_feature=action_features[0], robust=True, stats=['contribution'])
robust_feature_mda = t.get_prediction_stats(action_feature=action_features[0], robust=True, stats=['mda'])
Local Feature Importance#
To get local feature importance metrics, Trainee.react()
, is first called on a trained and analyzed Trainee. In this method, the desired metrics, feature_contributions
and feature_mda
, can be selected as inputs to the details
parameters as key value pairs from a dictionary. These parameters are named individually
and setting them to True
will calculate the desired metrics. Robust calculations are performed by default.
details = {
'feature_contributions':True,
'feature_mda':True,
}
results = t.react(
df,
context_features=context_features,
action_features=action_features,
details=details
)
In order to retrieve the calculated stats, they can be retrieved from the Trainee.react()
output dictionary. They are stored under the explanation
key under the name of the metric. Whether these metrics are robust or non-robust is determined when the metrics
are calculated in Trainee.react()
from the previous step.
robust_feature_contributions = results['explanation']['feature_contributions']
robust_feature_contributions = results['explanation']['feature_mda']
Warning
Contributions and MDA are also metrics for cases and not just features, so please be aware when reading other guides that may use those terms.
Example Use-Cases#
In addition to the examples above, here are a few example use-cases for feature importance.