Conviction#
Objectives: what you will take away#
How-To retrieve familiarity, similarity, and prediction residual conviction metrics.
Prerequisites: before you begin#
You’ve successfully installed Howso Engine
You have an understanding of Howso’s basic workflow.
Data#
Our example dataset for this recipe is the well known Adult
dataset. It is accessible via the pmlb package installed earlier. We use the fetch_data()
function to retrieve the dataset in Step 1 below.
Concepts & Terminology#
How-To Guide#
Familiarity Conviction and Similarity Conviction are measurements of how surprising a case is. This can be useful for tasks such as anomaly detection. Prediction Residual Conviction can be used to drill down into a specific case and examine its features. It measures how surprising each cases feature values is, thus it can reveal information such as why a case was anomalous. For example, if a NBA player’s height was 3 foot tall, that value would be very surprising since most NBA players are very tall.
Setup#
The user guide assumes you have created and setup a Trainee
as demonstrated in basic workflow.
The created Trainee
will be referenced as trainee
in the sections below. This guide also assumes you have installed the pmlb python library for the dataset used.
Familiarity Conviction#
There are two types of Familiarity Conviction available, both accessible when
Trainee.react_into_features()
is called. familiarity_conviction_addition
is the familiarity conviction of adding the specified case and familiarity_conviction_removal
is
the familiarity conviction of removing the specified case. Trainee.react_into_features()
stores these convictions which can be retrieved through Trainee.get_cases()
trainee.react_into_features(
familiarity_conviction_addition=True,
familiarity_conviction_removal=True
)
familiarity_conviction = trainee.get_cases(
session=trainee.active_session,
features=[
'familiarity_conviction_addition',
'familiarity_conviction_removal'
]
)
Similarity Conviction#
Similarity Conviction is a singular metric that is also accessible when
Trainee.react_into_features()
is called.
trainee.react_into_features(similarity_conviction=True)
saimilarity_conviction = trainee.get_cases(
session=trainee.active_session,
features=['similarity_conviction']
)
residual_conviction#
residual_conviction is accessed through Trainee.react()
and measures how noisy a feature is relative
to the expected level of noise for that feature.
details = {'feature_full_residual_convictions_for_case': True}
session_training_indices = trainee.get_session_training_indices(trainee.active_session)
session_training_indices = [(trainee.active_session.id, session_training_indices[0])]
reaction = trainee.react(
case_indices=session_training_indices,
preserve_feature_values=features.get_names(),
details=details,
)
residual_conviction = reaction["details"]["feature_full_residual_convictions_for_case"]
print(residual_conviction)
Complete Code#
The code from all of the steps in this guide is combined below:
import pandas as pd
from pmlb import fetch_data
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
df = fetch_data('adult').sample(1_000)
features = infer_feature_attributes(df)
print(features.to_dataframe())
trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze()
trainee.react_into_features(
familiarity_conviction_addition=True,
familiarity_conviction_removal=True
)
familiarity_conviction = trainee.get_cases(
session=trainee.active_session,
features=[
'familiarity_conviction_addition',
'familiarity_conviction_removal'
]
)
print(familiarity_conviction)
trainee.react_into_features(similarity_conviction=True)
similarity_conviction = trainee.get_cases(
session=trainee.active_session,
features=['similarity_conviction']
)
print(similarity_conviction)
details = {'feature_full_residual_convictions_for_case': True}
session_training_indices = trainee.get_session_training_indices(trainee.active_session)
session_training_indices = [(trainee.active_session.id, session_training_indices[0])]
reaction = trainee.react(
case_indices=session_training_indices,
preserve_feature_values=features.get_names(),
details=details,
)
residual_conviction = reaction["details"]["feature_full_residual_convictions_for_case"]
print(residual_conviction)
Below is an example of expected output from this sample code:
$ python conviction_example.py
type decimal_places bounds ... data_type original_type
min ... data_type size
age continuous 0 0.0 ... number numeric 8
workclass nominal 0 NaN ... number integer 8
fnlwgt continuous 0 0.0 ... number numeric 8
education nominal 0 NaN ... number integer 8
education-num continuous 0 0.0 ... number numeric 8
marital-status nominal 0 NaN ... number integer 8
occupation nominal 0 NaN ... number integer 8
relationship nominal 0 NaN ... number integer 8
race nominal 0 NaN ... number integer 8
sex nominal 0 NaN ... number integer 8
capital-gain continuous 0 0.0 ... number numeric 8
capital-loss continuous 0 0.0 ... number numeric 8
hours-per-week continuous 0 0.0 ... number numeric 8
native-country nominal 0 NaN ... number integer 8
target nominal 0 NaN ... number integer 8
[15 rows x 10 columns]
familiarity_conviction_addition familiarity_conviction_removal
0 2.036422 1.424936
1 7.239347 6.785857
2 0.796667 1.103851
3 0.499284 0.293830
4 0.486247 0.727024
.. ... ...
995 3.315124 3.537327
996 2.741911 1.970860
997 133.074640 118.813386
998 8.028792 7.916007
999 0.691341 0.819138
[1000 rows x 2 columns]
similarity_conviction
0 0.294567
1 0.561961
2 3.656589
3 0.326939
4 1.786323
.. ...
995 0.496362
996 0.504448
997 0.612071
998 0.960858
999 1.065963
[1000 rows x 1 columns]
[{'marital-status': 1.2677661102272322, 'race': 268.342074, 'target': 2.106858563940631, 'fnlwgt': 3.3412368323793595, 'education': 1, 'age': 0.6401583699403189, 'education-num': 1, 'sex': 1.3191579118090324, 'occupation': 1.4279943800852224, 'capital-loss': 1016.0995061247426, 'relationship': 1.6248012684002686, 'workclass': 0.218840321217643, 'hours-per-week': 0.7026121145303584, 'capital-gain': 2.528308627762195, 'native-country': 2.4681265907924397}]