Influential Data, Counterfactuals, and Uncertainties of Predictions#

Objectives: what you will take away#

  • Definitions & an understanding of influential data, counterfactuals, and uncertainty within Engine.

  • How-To obtain influential data, counterfactuals, and uncertainties using Engine.

Prerequisites: before you begin#

Notebook Recipe#

The following recipe will supplement the content this guide will cover:

Concepts & Terminology#

  • Uncertainty is the amount of information that is unknown about a prediction, and is characterized by a prediction’s residual. The Residual is the mean absolute error between a predicted and actual value.

  • Influential Cases are the records that were directly used to make a prediction or to derive a result.

  • Counterfactuals, or boundary cases, are the records that have similar Context Features to that of a prediction’s Context Features, but instead have different Action Feature values. In other words, these are records with similar information that contain a different result. For example, if the prediction for fruit type was “peach”, a boundary case might be a very peach-looking “nectarine”.

How-To Guide#

Task 1 - Obtain influential cases and counterfactuals (boundary cases)#

After building, training, and analyzing a Trainee, you can obtain the influential cases and counterfactuals for a prediction on a test case in a react() call.

# Details describe the information we are getting from a given react call
details = {
    'influential_cases': True,
    'boundary_cases': True
}

results = t.react(contexts=test_case[context_features], # Input values for which you want a prediction
                context_features=context_features,
                action_features=action_features,
                details=details
)

# Save the influential cases, for the prediction of the first record (record 0)
influence_df = pd.DataFrame(results['explanation']['influential_cases'][0])

# Save the boundary cases, for the prediction of the first record (record 0)
boundary_df = pd.DataFrame(results['explanation']['boundary_cases'][0])

Task 2 - Obtain uncertainty information#

Feature residuals are calculated by holding out each individual feature, and then using the other features to predict the holdout feature. This is similar to the leave-one-out validation technique used in traditional machine learning. The results represent the Trainee’s uncertainty for that feature. We will use the local feature residual to examine the uncertainty for a specific case and the global feature residual as a baseline.

## Compute local feature residuals
# Details describe the information we are getting from a given react call
details = {
    'feature_residuals': True,
}

results = t.react(contexts=test_case[context_features], # Input values for which you want a prediction
                context_features=context_features,
                action_features=action_features,
                details=details
)

# Save local feature residuals
feature_residuals_dicts = results['explanation']['feature_residuals']
feature_residuals = pd.DataFrame(list(feature_residuals_dicts[0].items()))
feature_residuals = feature_residuals.T
feature_residuals.columns = feature_residuals.loc[0]
feature_residuals = feature_residuals.drop(0, axis=0)

## Compute global feature residuals
# We use react_into_trainee to analyze the cases in our Trainee
t.react_into_trainee(context_features=context_features, action_feature=action_features[0], contributions_robust=True, mda=True, residuals=True)

global_feature_residuals = t.get_prediction_stats(action_feature=action_features[0], stats=['mae'])

API References#

  • Trainee.react()