Residuals#
Objectives: what you will take away#
How-To Retrieve global and local residuals.
Prerequisites: before you begin#
You’ve successfully installed Howso Engine
You have an understanding of Howso’s basic workflow.
Data#
Our example dataset for this recipe is the well known Adult dataset. It is accessible via the pmlb package installed earlier. We use the fetch_data() function to retrieve the dataset in Step 1 below.
Concepts & Terminology#
How-To Guide#
Setup#
The user guide assumes you have created and setup a Trainee as demonstrated in basic workflow.
The created Trainee will be referenced as trainee in the sections below.
[1]:
import pandas as pd
from pmlb import fetch_data
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
df = fetch_data('adult').sample(1_000)
features = infer_feature_attributes(df)
trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze()
features.to_dataframe()
/home/docs/checkouts/readthedocs.org/user_builds/diveplane-howso-docs/envs/latest/lib/python3.11/site-packages/howso/utilities/feature_attributes/pandas.py:148: UserWarning: You have one or more suggestions to consider for your feature attributes configuration. Please view them by printing the `suggestions` property of your returned feature attributes object (`your_attributes_object.suggestions`).
warnings.warn(suggestion_warning, UserWarning)
[1]:
| type | decimal_places | bounds | data_type | original_type | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| min | max | allow_null | observed_min | observed_max | data_type | size | ||||
| age | continuous | 0 | 0.0 | 124.0 | True | 17.0 | 82.0 | number | numeric | 8 |
| workclass | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| fnlwgt | continuous | 0 | 0.0 | 1559326.0 | True | 19847.0 | 953588.0 | number | numeric | 8 |
| education | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| education-num | continuous | 0 | 0.0 | 25.0 | True | 2.0 | 16.0 | number | numeric | 8 |
| marital-status | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| occupation | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| relationship | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| race | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| sex | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| capital-gain | continuous | 0 | 0.0 | 164870.0 | True | 0.0 | 99999.0 | number | numeric | 8 |
| capital-loss | continuous | 0 | 0.0 | 3982.0 | True | 0.0 | 2415.0 | number | numeric | 8 |
| hours-per-week | continuous | 0 | 0.0 | 162.0 | True | 2.0 | 99.0 | number | numeric | 8 |
| native-country | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
| target | nominal | 0 | NaN | NaN | False | NaN | NaN | number | integer | 8 |
Local Residuals#
Local metrics are retrieved through using Trainee.react().
Both Robust and non-robust (full) versions are available, although full
is recommended for residuals.
[2]:
# Get local full residuals
details = {'feature_full_residuals_for_case': True}
results = trainee.react(
df.iloc[[-1]],
context_features=features.get_names(without=["target"]),
action_features=["target"],
details=details
)
residuals = results['details']['feature_full_residuals_for_case']
residuals
[2]:
| native-country | capital-loss | target | fnlwgt | sex | workclass | relationship | age | race | capital-gain | education-num | occupation | marital-status | education | hours-per-week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | [0.10497122025332606, 0] | [0, 0] | [0, 0] | [3892.8891464159533, 0] | [0.5347950770078824, 0] | [0.11679499294505225, 0] | [0.25755208025854126, 0] | [0.8980091835356987, 0] | [0.04824150751552414, 0] | [0, 0] | [0, 0] | [0.648723288676678, 0] | [0.05021972175534373, 0] | [0, 0] | [8.596570848836706, 0] |
Global Residuals#
Howso has the ability to retrieve both local vs global metrics.
Global metrics are retrieved through using Trainee.react_aggregate(). Both Robust and non-robust (full) versions are also available.
[3]:
# Get global full residuals
residuals = trainee.react_aggregate(
details={'feature_full_residuals': True},
).to_dataframe()
residuals
[3]:
| native-country | capital-loss | target | fnlwgt | sex | workclass | relationship | age | race | occupation | education-num | capital-gain | marital-status | education | hours-per-week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| feature_full_residuals | [0.17635573108240263, 0] | [163.0842452028395, 0] | [0.21930926983381843, 0] | [81408.23083480468, 0] | [0.2607118895803338, 0] | [0.3669314376133491, 0] | [0.3186086818086218, 0] | [8.3280741172867, 0] | [0.20994217929625442, 0] | [0.7944257293597181, 0] | [0.16888651553946177, 0] | [2155.029487759862, 0] | [0.21729213822237398, 0] | [9.032563585975595e-14, 0] | [7.691471588572398, 0] |