Residuals#

Objectives: what you will take away#

  • How-To Retrieve global and local residuals.

Prerequisites: before you begin#

Data#

Our example dataset for this recipe is the well known Adult dataset. It is accessible via the pmlb package installed earlier. We use the fetch_data() function to retrieve the dataset in Step 1 below.

Concepts & Terminology#

How-To Guide#

Setup#

The user guide assumes you have created and setup a Trainee as demonstrated in basic workflow. The created Trainee will be referenced as trainee in the sections below.

[1]:
import pandas as pd
from pmlb import fetch_data

from howso.engine import Trainee
from howso.utilities import infer_feature_attributes

df = fetch_data('adult').sample(1_000)
features = infer_feature_attributes(df)

trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze()

features.to_dataframe()
[1]:
type decimal_places bounds data_type original_type
min max allow_null observed_min observed_max data_type size
age continuous 0 0.0 137.0 True 17.0 90.0 number numeric 8
workclass nominal 0 NaN NaN False NaN NaN number integer 8
fnlwgt continuous 0 0.0 1322258.0 True 19302.0 809585.0 number numeric 8
education nominal 0 NaN NaN False NaN NaN number integer 8
education-num continuous 0 0.0 26.0 True 1.0 16.0 number numeric 8
marital-status nominal 0 NaN NaN False NaN NaN number integer 8
occupation nominal 0 NaN NaN False NaN NaN number integer 8
relationship nominal 0 NaN NaN False NaN NaN number integer 8
race nominal 0 NaN NaN False NaN NaN number integer 8
sex nominal 0 NaN NaN False NaN NaN number integer 8
capital-gain continuous 0 0.0 164870.0 True 0.0 99999.0 number numeric 8
capital-loss continuous 0 0.0 4656.0 True 0.0 2824.0 number numeric 8
hours-per-week continuous 0 0.0 163.0 True 1.0 99.0 number numeric 8
native-country nominal 0 NaN NaN False NaN NaN number integer 8
target nominal 0 NaN NaN False NaN NaN number integer 8

Local Residuals#

Local metrics are retrieved through using Trainee.react(). Both Robust and non-robust (full) versions are available, although full is recommended for residuals.

[2]:
# Get local full residuals
details = {'feature_full_residuals_for_case': True}
results = trainee.react(
    df.iloc[[-1]],
    context_features=features.get_names(without=["target"]),
    action_features=["target"],
    details=details
)

residuals = results['details']['feature_full_residuals_for_case']
residuals
[2]:
[{'workclass': 0.05552155884324439,
  'relationship': 0.9488982970831705,
  'capital-gain': 3132,
  'marital-status': 0.894451821150195,
  'age': 5,
  'sex': 0,
  'fnlwgt': 79346,
  'capital-loss': 0,
  'education-num': 0,
  'education': 0,
  'race': 0.9321238485895957,
  'occupation': 0.7492518799140462,
  'hours-per-week': 3,
  'native-country': 0.1489881109797705,
  'target': 0.06859635519050711}]

Global Residuals#

Howso has the ability to retrieve both local vs global metrics. Global metrics are retrieved through using Trainee.react_aggregate(). Both Robust and non-robust (full) versions are also available.

[3]:
# Get global full residuals
residuals = trainee.react_aggregate(
    details={'feature_full_residuals': True},
)
residuals
[3]:
{'feature_full_residuals': {'workclass': 0.3781095646460791,
  'relationship': 0.31080776857046544,
  'capital-gain': 1480.634510546924,
  'sex': 0.2370098757430592,
  'marital-status': 0.2275392590599643,
  'age': 8.813090014097074,
  'fnlwgt': 80042.553611248,
  'capital-loss': 164.72813042173732,
  'race': 0.20188432692970415,
  'occupation': 0.7788787488614444,
  'hours-per-week': 8.241933687087792,
  'education': 0.0010907657003103792,
  'education-num': 0.24581046793439118,
  'native-country': 0.135475182620448,
  'target': 0.18480587347693922}}

API References#