howso.scikit#

Classes

HowsoEstimator

This class is intended for use within scikit-learn only.

HowsoRegressor

A HowsoEstimator for regression analysis.

HowsoClassifier

A HowsoEstimator for classification analysis.

The Python API for the Howso Scikit Client.

class howso.scikit.HowsoClassifier(client=None, features=None, targets=None, verbose=False, debug=False, ttl=43200000, client_params=None, trainee_params=None)#

Bases: HowsoEstimator

A HowsoEstimator for classification analysis.

Parameters:
  • features (dict of str: dict, default None) –

    The features that will predict the targets(s). Will be generated automatically if not specified.

    Example:

    {
        "feature_name": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "length": { "type" : "continuous", "decimal_places": 1 },
        "width": { "type" : "continuous", "significant_digits": 4 },
        "degrees": { "type" : "continuous", "cycle_length": 360 },
        "class": { "type" : "nominal" }
    }
    

  • targets (dict of str: dict, default None) –

    The target(s) to be predicted. Will be generated automatically if not specified.

    Example:

    {
        "target_name": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "klass": { "type" : "nominal" }
    }
    

  • client (AbstractHowsoClient, default None) – A subclass of AbstractHowsoClient used to interface with Howso.

  • verbose (boolean, default False) – A flag for verbose output.

  • debug (boolean, default False) – A flag for debug output.

  • ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.

  • client_params (dict, default None) – The parameters with which to instantiate the client.

  • trainee_params (dict, default None) – The parameters with which to instantiate the client. Intended for use by HowsoEstimator.get_params.

fit(X, y, analyze=True)#

Fit a model with Howso.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Data

  • y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

  • analyze (bool, default=True) –

    (Optional) If trainee should be analyzed.

    • a user may plan to call analyze themselves after fit() to specify parameters

Returns:

self

Return type:

HowsoEstimator

load(trainee_id)#

Load the trainee and re-populates the classes_ variable.

This is based on the available classes in the loaded trainee.

Parameters:

trainee_id (str) – The id of the trainee.

partial_fit(X, y)#

Adds data to an existing Howso model.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Data

  • y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

predict_proba(X)#

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function and normalize these values across all the classes.

NOTE: Only works with single target models at this time.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data

Returns:

The probabilities of the classes for the given prediction.

Return type:

numpy.ndarray, shape (n_samples, n_classes)

set_fit_request(*, analyze: bool | None | str = '$UNCHANGED$') HowsoClassifier#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.

Returns:

self – The updated object.

Return type:

object

class howso.scikit.HowsoEstimator(client=None, features=None, targets=None, method=None, verbose=False, debug=False, ttl=43200000, trainee_params=None, client_params=None)#

Bases: BaseEstimator

This class is intended for use within scikit-learn only.

This Estimator follows scikit-learn’s conventions. For access to a wider range of Howso capabilities, please use the client specified in the howso.client module.

Parameters:
  • features (dict of str: dict, default None) –

    The features that will predict the targets(s). Will be generated automatically if not specified.

    Example:

    {
        "feature_name": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "length": { "type" : "continuous", "decimal_places": 1 },
        "width": { "type" : "continuous", "significant_digits": 4 },
        "degrees": { "type" : "continuous", "cycle_length": 360 },
        "class": { "type" : "nominal" }
    }
    

  • targets (dict of str: dict, default None) –

    The target(s) to be predicted. Will be generated automatically if not specified.

    Example:

    {
        "`target_name`": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "klass": { "type" : "nominal" }
    }
    

  • client (AbstractHowsoClient, default None) – A subclass of AbstractHowsoClient used to interface with Howso.

  • method (str) – One of ‘classification’ or ‘regression’.

  • verbose (boolean, default False) – A flag for verbose output.

  • debug (boolean, default False) – A flag for debug output.

  • ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.

  • client_params (dict, default None) – The parameters with which to instantiate the client.

  • trainee_params (dict, default None) – The parameters with which to instantiate the trainee.

Examples

>>> import pandas as pd
>>> from howso.scikit import HowsoClassifier
>>> from sklearn.model_selection import train_test_split
>>> # Read in the data.
>>> df = pd.read_csv('iris.csv')
>>>
>>> # Split the dataset into the feature (X) and targets (y) and convert
>>> # the string targets into integer hashes.
>>> X = df.drop('class', axis=1).values.astype(float)
>>> y = df['class'].apply(hash).values.astype(int)
>>>
>>> # Split the dataset into an 80/20 train/test set.
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=1)
>>>
>>> # Create a classifier.
>>> howso = HowsoClassifier()
>>>
>>> # Fit the training data.
>>> howso.fit(X_train, y_train)
>>>
>>> # Test against the reserved test data.
>>> score = howso.score(X_test, y_test)
>>>
>>> # Print the resulting accuracy.
>>> print(score)
0.9666666666666667
analyze(seed=None, **kwargs)#

Analyze a trainee.

Parameters:
  • seed (int, optional) – A random seed.

  • **kwargs – Refer to docstring in howso.client.analyze method for complete reference of all parameters

delete()#

Delete this trainee from the howso cloud service.

describe_prediction(X, details=None)#

Describe a prediction in detail.

Parameters:
  • X (numpy.ndarray) – Feature values.

  • details (dict, default None) –

    (Optional) If details are specified, the response will contain the requested explanation data along with the reaction. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the audit data returned.

    • boundary_casesbool, optional

      If True, outputs an automatically determined (when ‘num_boundary_cases’ is not specified) relevant number of boundary cases. Uses both context and action features of the reacted case to determine the counterfactual boundary based on action features, which maximize the dissimilarity of action features while maximizing the similarity of context features. If action features aren’t specified, uses familiarity conviction to determine the boundary instead.

    • boundary_cases_familiarity_convictionsbool, optional

      If True, outputs familiarity conviction of addition for each of the boundary cases.

    • case_contributionsbool, optional

      If True, outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • case_feature_residualsbool, optional

      If True, outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • case_mdabool, optional

      If True, outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • categorical_action_probabilitiesbool, optional

      If True, outputs probabilities for each class for the action. Applicable only to categorical action features.

    • derivation_parametersbool, optional

      If True, outputs a dictionary of the parameters used in the react call. These include k, p, distance_transform, feature_weights, feature_deviations, nominal_class_counts, and use_irw.

      • k: the number of cases used for the local model.

      • p: the parameter for the Lebesgue space.

      • distance_transform: the distance transform used as an exponent to convert distances to raw influence weights.

      • feature_weights: the weight for each feature used in the distance metric.

      • feature_deviations: the deviation for each feature used in the distance metric.

      • nominal_class_counts: the number of unique values for each nominal feature. This is used in the distance metric.

      • use_irw: a flag indicating if feature weights were derived using inverse residual weighting.

    • distance_contributionbool, optional

      If True, outputs the distance contribution (expected total surprisal contribution) for the reacted case. Uses both context and action feature values.

    • distance_ratiobool, optional

      If True, outputs the ratio of distance (relative surprisal) between this reacted case and its nearest case to the minimum distance (relative surprisal) in between the closest two cases in the local area. All distances are computed using only the specified context features.

    • feature_contributionsbool, optional

      If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation. Directional feature contributions are returned under the key ‘directional_feature_contributions’.

    • case_feature_contributions: bool, optional

      If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context feature were not in the model for all context features in this case, using only the values from this specific case. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation. Directional case feature contributions are returned under the ‘case_directional_feature_contributions’ key.

    • feature_mdabool, optional

      If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • feature_mda_ex_postbool, optional

      If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation detail given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • featureslist of str, optional

      A list of feature names that specifies for what features will per-feature details be computed (residuals, contributions, mda, etc.). This should generally preserve compute, but will not when computing details robustly. Details will be computed for all context and action features if this value is not specified.

    • feature_residualsbool, optional

      If True, outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • global_case_feature_residual_convictionsbool, optional

      If True, outputs this case’s feature residual convictions for the global model. Computed as: global model feature residual divided by case feature residual. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • hypothetical_valuesdict, optional

      A dictionary of feature name to feature value. If specified, shows how a prediction could change in a what-if scenario where the influential cases’ context feature values are replaced with the specified values. Iterates over all influential cases, predicting the action features each one using the updated hypothetical values. Outputs the predicted arithmetic over the influential cases for each action feature.

    • influential_casesbool, optional

      If True, outputs the most influential cases and their influence weights based on the surprisal of each case relative to the context being predicted among the cases. Uses only the context features of the reacted case.

    • influential_cases_familiarity_convictionsbool, optional

      If True, outputs familiarity conviction of addition for each of the influential cases.

    • influential_cases_raw_weightsbool, optional

      If True, outputs the surprisal for each of the influential cases.

    • local_case_feature_residual_convictionsbool, optional

      If True, outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • most_similar_casesbool, optional

      If True, outputs an automatically determined (when ‘num_most_similar_cases’ is not specified) relevant number of similar cases, which will first include the influential cases. Uses only the context features of the reacted case.

    • num_boundary_casesint, optional

      Outputs this manually specified number of boundary cases.

    • num_most_similar_casesint, optional

      Outputs this manually specified number of most similar cases, which will first include the influential cases.

    • num_most_similar_case_indices: int, optional

      Outputs this specified number of most similar case indices when ‘distance_ratio’ is also set to True.

    • observational_errorsbool, optional

      If True, outputs observational errors for all features as defined in feature attributes.

    • outlying_feature_valuesbool, optional

      If True, outputs the reacted case’s context feature values that are outside the min or max of the corresponding feature values of all the cases in the local model area. Uses only the context features of the reacted case to determine that area.

    • similarity_convictionbool, optional

      If True, outputs similarity conviction for the reacted case. Uses both context and action feature values as the case values for all computations. This is defined as expected (local) distance contribution divided by reacted case distance contribution.

    • robust_computation: bool, optional

      Deprecated. If specified, will overwrite the value of both ‘robust_residuals’ and ‘robust_influences’.

    • robust_residuals: bool, optional

      Default is false, uses leave-one-out for features (or cases, as needed) for all residual computations. When true, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.

    • robust_influences: bool, optional

      Default is true, uses leave-one-out for features (or cases, as needed) for all MDA and contribution computations. When true, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.

    • generate_attemptsbool, optional

      If True outputs the number of attempts taken to generate each case. Only applicable when ‘generate_new_cases’ is “always” or “attempt”.

    >>> details = {'num_most_similar_cases': 5,
    ...            'feature_residuals': True}
    

Returns:

Format of:

{
    'action': list of dicts of action_features -> action_values,
    'details': dict with requested audit data
}

Return type:

dict

feature_add(feature=None, value=None)#

Add a feature to a trainee.

Parameters:
  • feature (str, optional) – The name of the feature. Will be generated automatically if not specified.

  • value (int or float or str, optional) – The value to populate the feature with.

feature_remove(feature=None)#

Remove a feature from a trainee.

Parameters:

feature (str, default None) – Optional. The name of the feature to remove. Will quietly do nothing if the feature was not found.

fit(X, y, analyze=True)#

Fit a model with Howso.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Data

  • y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

  • analyze (bool, default=True) –

    A flag to not analyze the trainee by default

    • A user may plan to call analyze themselves after fit() to specify parameters

Returns:

self

Return type:

HowsoEstimator

get_case_conviction(X, features=None)#

Return case conviction.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Data

  • features (str or list of str) – A list of feature names to calculate convictions.

Returns:

The conviction of the cases. Ex: [1.0, 3.2, 0.4]

Return type:

list

get_feature_conviction(features=None)#

Gets the conviction of the features in a model.

Parameters:

features (str or list of str) – Features to return conviction values for.

Returns:

A map of feature convictions and contributions.

Return type:

dict

get_params(deep=True)#

Get parameters for this estimator.

This code is taken from the source of sklearn.base.BaseEstimator and lightly modified to avoid calling the get_params method of self.trainee.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

load(trainee_id)#

Load a model from the server.

Parameters:

trainee_id (str) – Id of the trainee. (can be obtained from this class).

partial_fit(X, y)#

Add data to an existing Howso model.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Data

  • y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

partial_unfit(precision, num_cases, criteria=None)#

Remove a training case from a trainee.

The training case will be completely purged from the model and the model will behave as if it had never been trained with this training case.

Parameters:
  • precision (str) – The precision to use when removing the case. Options are ‘exact’ or ‘similar’.

  • num_cases (int) – The number of cases to remove; minimum 1 case must be removed.

  • criteria (dict, default None) – The condition map to select the cases to remove that meet all the provided conditions. Keys - features, values - one of | null (must have the feature) | a value (must match exactly) | an array of two values (a range, feature values must be between)

predict(X)#

Make predictions using Howso.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data

Returns:

The predicted values based on the feature values provided.

Return type:

numpy.ndarray, shape (n_samples,)

react_into_features(features=None, *, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, influence_weight_entropy=False, p_value_of_addition=False, p_value_of_removal=False, similarity_conviction=False, use_case_weights=False, weight_feature=None)#

Calculate conviction and other data and stores them into features.

Parameters:
  • features (list of str) – A list of the feature names to use when calculating conviction.

  • distance_contribution (bool or str, default False) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.

  • familiarity_conviction_addition (bool or str, default False) – The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.

  • familiarity_conviction_removal (bool or str, default False) – The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.

  • influence_weight_entropy (bool or str, default False) – The name of the feature to store influence weight entropy values in. If set to True, the values will be stored in the feature ‘influence_weight_entropy’.

  • p_value_of_addition (bool or str, default False) – The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.

  • p_value_of_removal (bool or str, default False) – The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.

  • similarity_conviction (bool or str, default False) – The name of the feature to store similarity conviction values. If set to True the values will be stored to the feature ‘similarity_conviction’.

  • use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Return type:

None

release_resources()#

Release trainee resources created by this estimator.

If this estimator’s trainee is named (self._trainee_name is not None) then we’ll make an effort to persist the trainee to disk and release it’s resources. If the data persistence policy forbids this, that call will return an error. Upon error, delete_trainee() instead.

NOTE: Errors are handled immediately because this is the instance’s

destructor. There is no further recourse at this point.

Return type:

None

save()#

Persist the trainee.

By default model resources are released after a short period of time. This method saves the model persistently to allow releasing trainee resources while keeping the model available for use later.

If this trainee has not already been named, then this method will set a randomly generated one.

Raises:
  • HowsoNotUniqueError: – If unable to set the trainee name w/up to RENAME_RETRIES retries.

  • Exception: – if unable to persist the trainee.

Return type:

None

score(X, y)#

Score Howso.

For classifiers, accuracy is calculated. For regressors, R^2 is calculated.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) – Test samples.

  • y (numpy.ndarray, shape (n_samples) or (n_samples, n_outputs)) – True values for X.

Returns:

The mean squared error or accuracy

Return type:

float

set_fit_request(*, analyze: bool | None | str = '$UNCHANGED$') HowsoEstimator#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.

Returns:

self – The updated object.

Return type:

object

property trainee_id: str | None#

Return the trainee’s ID, if possible.

property trainee_name: str | None#

Return the trainee name (getter).

class howso.scikit.HowsoRegressor(client=None, features=None, targets=None, verbose=False, debug=False, ttl=43200000, client_params=None, trainee_params=None)#

Bases: HowsoEstimator

A HowsoEstimator for regression analysis.

Parameters:
  • features (dict of str: dict, default None) –

    The features that will predict the targets(s). Will be generated automatically if not specified.

    Example:

    {
        "feature_name": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "length": { "type" : "continuous", "decimal_places": 1 },
        "width": { "type" : "continuous", "significant_digits": 4 },
        "degrees": { "type" : "continuous", "cycle_length": 360 },
        "class": { "type" : "nominal" }
    }
    

  • targets (dict of str: dict, default None) –

    The target(s) to be predicted. Will be generated automatically if not specified.

    Example:

    {
        "target_name": {
            "parameter1" : "value1",
            "parameter2" : "value2"
        },
        "klass": { "type" : "nominal" }
    }
    

  • client (AbstractHowsoClient, default None) – A subclass of AbstractHowsoClient used to interface with Howso.

  • verbose (boolean, default False) – A flag for verbose output.

  • debug (boolean, default False) – A flag for debug output.

  • ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.

  • client_params (dict, default None) – The parameters with which to instantiate the client.

  • trainee_params (dict, default None) – The parameters with which to instantiate the client. Intended for use by HowsoEstimator.get_params.

set_fit_request(*, analyze: bool | None | str = '$UNCHANGED$') HowsoRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.

Returns:

self – The updated object.

Return type:

object