howso.engine#
Submodules
The typing module: Support for gradual typing as defined by PEP 484 and subsequent PEPs. |
Classes
A Howso Project. |
|
A Howso Session. |
|
A Howso Trainee. |
Functions
Delete an existing project. |
|
Delete an existing Trainee. |
|
Get the active session. |
|
Get the active Howso client instance. |
|
Get an existing project. |
|
Get an existing Session. |
|
Get an existing trainee from Howso Services. |
|
Query accessible Projects. |
|
Query accessible Sessions. |
|
Query accessible Trainees. |
|
Load an existing trainee from disk. |
|
Query accessible Projects. |
|
Query accessible Sessions. |
|
Query accessible Trainees. |
|
Set the active project. |
|
Set the active Howso client instance to use for the API. |
The Python API for the Howso Engine Client.
- class howso.engine.Project(name, *, id=None, client=None)#
Bases:
Project
A Howso Project.
A Project is a container for a collection of Trainees. Allowing control over who may view and modify the Trainees based on their membership access to the project.
- Parameters:
name (
str
) – The name of the project.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use. Must support the Project API.id (
str
|None
, default:None
)
- delete()#
Delete the project.
Projects may only be deleted when they have no trainees in them.
- Return type:
None
- classmethod from_dict(schema)#
Returns a new Project using properties from dict.
- Parameters:
schema (
Mapping
)
- classmethod from_schema(schema, *, client=None)#
Create Project from base class.
- Parameters:
schema (
BaseSchema
) – The base Project object.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The Project instance.
- Return type:
- property client: ProjectClient#
The client instance used by the project.
- Returns:
The client instance.
- property name: str#
The name of the Project.
- Returns:
The Project name.
- class howso.engine.Session(name=None, *, id=None, metadata=None, client=None)#
Bases:
Session
A Howso Session.
- Parameters:
name (
str
|None
, default:None
) – The name of the session.metadata (
dict
|None
, default:None
) – Any key-value pair to store custom metadata for the session.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.id (
str
|UUID
|None
, default:None
)
- classmethod from_dict(schema)#
Returns a new Session using properties from dict.
- Parameters:
schema (
Mapping
)
- classmethod from_schema(schema, *, client=None)#
Create Session from base class.
- Parameters:
schema (
Session
) – The base Session object.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The Session instance.
- Return type:
- set_metadata(metadata)#
Update the session metadata.
- Parameters:
metadata (
dict
|None
) – Any key-value pair to store as custom metadata for the session. Providing None will remove the current metadata.- Return type:
None
- property client: AbstractHowsoClient#
The client instance used by the session.
- Returns:
The client instance.
- property metadata: dict | None#
The Session metadata.
Warning
This returns a deep copy of the metadata. To update the metadata of the session, use the method
set_metadata()
.- Returns:
The metadata of the Session.
- class howso.engine.Trainee(name=None, features=None, *, overwrite_existing=False, persistence='allow', id=None, library_type=None, max_wait_time=None, metadata=None, project=None, resources=None, client=None)#
Bases:
Trainee
A Howso Trainee.
A Trainee is most closely related to what would normally be called a ‘model’ in Machine Learning. It contains feature information, training cases, session data, parameters, and other metadata. A Trainee is actually a little more abstract than a model which is why we don’t use the terms interchangeably.
- Parameters:
name (
str
|None
, default:None
) – The name of the trainee.features (
Mapping
[str
,Mapping
] |SingleTableFeatureAttributes
|None
, default:None
) – The feature attributes of the trainee. Where featurename
is the key and a sub dictionary of feature attributes is the value. If this is not specified in the constructor, it must be set during or beforetrain()
.id (
str
|None
, default:None
) – The unique identifier of the Trainee. The client automatically completes this field and the user should NOT manually use this parameter. Please use thename
parameter to manually specify a Trainee name.library_type (
Literal
['st'
,'mt'
] |None
, default:None
) – The library type of the Trainee. “st” will use the single-threaded library, while “mt” will use the multi-threaded library.max_wait_time (
int
|float
|None
, default:None
) – The number of seconds to wait for a trainee to be created and become available before aborting gracefully. Set to0
(or None) to wait as long as the system-configured maximum for sufficient resources to become available, which is typically 20 minutes.persistence (
Literal
['allow'
,'always'
,'never'
], default:'allow'
) – The requested persistence state of the trainee.project (
str
|Project
|None
, default:None
) – The instance or id of the project to use for the trainee.metadata (
Mapping
[str
,Any
] |None
, default:None
) – Any key-value pair to store as custom metadata for the trainee.resources (
Mapping
[str
,Any
] |None
, default:None
) – Customize the resources provisioned for the Trainee instance.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.overwrite_existing (
bool
, default:False
) – Overwrite existing trainee with the same name (if exists).
- acquire_resources(*, max_wait_time=None)#
Acquire resources for a trainee in the Howso service.
- Parameters:
max_wait_time (
int
|float
|None
, default:None
) – The number of seconds to wait for trainee resources to be acquired before aborting gracefully. Set to 0 (or None) to wait as long as the system-configured maximum for sufficient resources to become available, which is typically 20 minutes.
- add_feature(feature, feature_value=None, *, overwrite=False, condition=None, condition_session=None, feature_attributes=None)#
Add a feature to the model.
Updates the accumulated data mass for the model proportional to the number of cases modified.
- Parameters:
feature (
str
) – The name of the feature.feature_attributes (
Mapping
[str
,Any
] |None
, default:None
) – The dict of feature specific attributes for this feature. If unspecified and conditions are not specified, will assume feature type as ‘continuous’.feature_value (
int
|float
|str
|None
, default:None
) – The value to populate the feature with. By default, populates the new feature with None.condition (
Mapping
[str
,Any
] |None
, default:None
) –A condition map where feature values will only be added when certain criteria is met.
If None, the feature will be added to all cases in the model and feature metadata will be updated to include it. If specified as an empty dict, the feature will still be added to all cases in the model but the feature metadata will not be updated.
Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
Tip
For instance to add the
feature_value
only when thelength
andwidth
features are equal to 10:condition = {"length": 10, "width": 10}
condition_session (
str
|Session
|None
, default:None
) – If specified, ignores the condition and operates on cases for the specified session id or Session instance.overwrite (
bool
, default:False
) – If True, the feature will be over-written if it exists.
- analyze(context_features=None, action_features=None, *, bypass_calculate_feature_residuals=None, bypass_calculate_feature_weights=None, bypass_hyperparameter_analysis=None, dt_values=None, inverse_residuals_as_weights=None, k_folds=None, k_values=None, num_analysis_samples=None, num_samples=None, analysis_sub_model_size=None, p_values=None, targeted_model=None, use_case_weights=None, use_deviations=None, weight_feature=None, **kwargs)#
Analyzes the trainee.
- Parameters:
context_features (
Collection
[str
] |None
, default:None
) – The context features to analyze for.action_features (
Collection
[str
] |None
, default:None
) – The action features to analyze for.bypass_calculate_feature_residuals (
bool
|None
, default:None
) – When True, bypasses calculation of feature residuals.bypass_calculate_feature_weights (
bool
|None
, default:None
) – When True, bypasses calculation of feature weights.bypass_hyperparameter_analysis (
bool
|None
, default:None
) – When True, bypasses hyperparameter analysis.dt_values (
Collection
[float
] |None
, default:None
) – The dt value hyperparameters to analyze with.inverse_residuals_as_weights (
bool
|None
, default:None
) – When True, will compute and use inverse of residuals as feature weights.k_folds (
int
|None
, default:None
) – The number of cross validation folds to do. A value of 1 does hold-one-out instead of k-fold.k_values (
Collection
[int
] |None
, default:None
) – The k value hyperparameters to analyze with.num_analysis_samples (
int
|None
, default:None
) – Specifies the number of observations to be considered for analysis.num_samples (
int
|None
, default:None
) – Number of samples used in calculating feature residuals.analysis_sub_model_size (
int
|None
, default:None
) – Number of samples to use for analysis. The rest will be randomly held-out and not included in calculations.p_values (
Collection
[float
] |None
, default:None
) – The p value hyperparameters to analyze with.targeted_model (
Literal
['single_targeted'
,'omni_targeted'
,'targetless'
] |None
, default:None
) –Type of hyperparameter targeting. Valid options include:
single_targeted: Analyze hyperparameters for the specified action_features.
omni_targeted: Analyze hyperparameters for each context feature as an action feature, ignores action_features parameter.
targetless: Analyze hyperparameters for all context features as possible action features, ignores action_features parameter.
use_case_weights (
bool
|None
, default:None
) – If set to True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.use_deviations (
bool
|None
, default:None
) – When True, uses deviations for LK metric in queries.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.**kwargs – Additional experimental analyze parameters.
- append_to_series_store(series, contexts, *, context_features=None)#
Append the specified contexts to a series store.
For use with train series.
- Parameters:
series (
str
) – The name of the series store to append to.contexts (
DataFrame
|list
[list
[Any
]]) – The list of context values to append to the series.context_features (
Collection
[str
] |None
, default:None
) – The list of feature names for contexts.
- auto_analyze()#
Auto-analyze the trainee.
Re-use all parameters from the previous
analyze()
call, assuming that the user has calledanalyze()
before. If not, it will default to a robust and versatile analysis.- Return type:
None
- clear_imputed_data(impute_session=None)#
Clears values that were imputed during a specified session.
Won’t clear values that were manually set by the user after the impute.
- Parameters:
impute_session (
str
|Session
|None
, default:None
) – Session or session identifier of the impute for which to clear the data. If none is provided, will clear all imputed.
- copy(name=None, *, library_type=None, project=None, resources=None)#
Copy the trainee to another trainee.
- Parameters:
name (
str
|None
, default:None
) – The name of the new trainee.library_type (
Literal
['st'
,'mt'
] |None
, default:None
) – The library type of the Trainee. “st” will use the single-threaded library, while “mt” will use the multi-threaded library.project (
str
|Project
|None
, default:None
) – The instance or id of the project to use for the new trainee.resources (
Mapping
[str
,Any
] |None
, default:None
) – Customize the resources provisioned for the Trainee instance. If not specified, the new trainee will inherit the value from the original.
- Returns:
The new trainee copy.
- Return type:
- delete()#
Delete the trainee from the last loaded or saved location.
If trying to delete a trainee from another location, see
delete_trainee()
.
- delete_session(session)#
Delete a session from the trainee.
- Parameters:
session (
str
|Session
) – The id or instance of the session to remove from the model.
- edit_cases(feature_values, *, case_indices=None, condition=None, condition_session=None, features=None, num_cases=None, precision=None)#
Edit feature values for the specified cases.
Updates the accumulated data mass for the model proportional to the number of cases and features modified.
- Parameters:
feature_values (
DataFrame
|list
[list
[Any
]]) – The feature values to edit the case(s) with. If specified as a list, the order corresponds with the order of thefeatures
parameter. If specified as a DataFrame, only the first row will be used.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – An iterable of Sequences containing the session id and index, where index is the original 0-based index of the case as it was trained into the session. This explicitly specifies the cases to edit. When specified,condition
andcondition_session
are ignored.condition (
Mapping
[str
,Any
] |None
, default:None
) –A condition map to select which cases to edit. Ignored when
case_indices
are specified.Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
condition_session (
str
|Session
|None
, default:None
) – If specified, ignores the condition and operates on all cases for the specified session id or Session instance.features (
Collection
[str
] |None
, default:None
) – The names of the features to edit. Required whenfeature_values
is not specified as a DataFrame.num_cases (
int
|None
, default:None
) – The maximum amount of cases to edit. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.precision (
Literal
['exact'
,'similar'
] |None
, default:None
) – The precision to use when removing the cases. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used.
- Returns:
The number of cases modified.
- Return type:
int
- evaluate(features_to_code_map, *, aggregation_code=None)#
Evaluates custom code on feature values of all cases in the trainee.
- Parameters:
features_to_code_map (
Mapping
[str
,str
]) –A dictionary with feature name keys and custom Amalgam code string values.
The custom code can use "#feature_name 0" to reference the value of that feature for each case.
aggregation_code (
str
|None
, default:None
) –A string of custom Amalgam code that can access the list of values derived form the custom code in features_to_code_map.
The custom code can use "#feature_name 0" to reference the list of values derived from using the custom code in features_to_code_map.
- Returns:
A dictionary with keys: ‘evaluated’ and ‘aggregated’.
’evaluated’ is a dictionary with feature name keys and lists of values derived from the features_to_code_map custom code.
’aggregated’ is None if no aggregation_code is given, it otherwise holds the output of the custom ‘aggregation_code’
- Return type:
Evaluation
- classmethod from_dict(schema)#
Create Trainee from Mapping.
- Parameters:
schema (
Mapping
) – The Trainee parameters.- Returns:
The trainee instance.
- Return type:
- classmethod from_schema(schema, *, client=None)#
Create Trainee from base class.
- Parameters:
schema (
Trainee
) – The base Trainee object.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The Trainee instance.
- Return type:
- get_auto_ablation_params()#
Get trainee parameters for auto-ablation set by
set_auto_ablation_params()
.- Returns:
A dictionary mapping parameter names to parameter values.
- Return type:
dict[str, Any]
- get_cases(*, indicate_imputed=False, case_indices=None, features=None, session=None, condition=None, num_cases=None, precision=None)#
Get the trainee’s cases.
- Parameters:
case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) –List of tuples, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns only these cases and ignores the session parameter.
Note
If case_indices are provided, condition (and precision) are ignored.
features (
Collection
[str
] |None
, default:None
) –A list of feature names to return values for in leu of all default features.
Built-in features that are available for retrieval:
.session - The session id the case was trained under..session_training_index - 0-based original index of the case, ordered by training during the session; is never changed.indicate_imputed (
bool
, default:False
) – If True, an additional value will be appended to the cases indicating if the case was imputed.session (
str
|Session
|None
, default:None
) –The id or instance of the session to retrieve training indices for from the model.
Note
If a session is not provided, the order of the cases is not guaranteed to be the same as the order they were trained into the model.
condition (
Mapping
[str
,Any
] |None
, default:None
) –The condition map to select the cases to retrieve that meet all the provided conditions.
Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
Note
This option will be ignored if case_indices is supplied.
Tip
Example 1 - Retrieve all values belonging to
feature_name
:criteria = {"feature_name": None}
Example 2 - Retrieve cases that have the value 10:
criteria = {"feature_name": 10}
Example 3 - Retrieve cases that have a value in range [10, 20]:
criteria = {"feature_name": [10, 20]}
Example 4 - Retrieve cases that match one of [‘a’, ‘c’, ‘e’]:
condition = {"feature_name": ['a', 'c', 'e']}
Example 5 - Retrieve cases using session name and index:
criteria = {'.session':'your_session_name', '.session_training_index': 1}
num_cases (
int
|None
, default:None
) – The maximum amount of cases to retrieve. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.precision (
Literal
['exact'
,'similar'
] |None
, default:None
) – The precision to use when retrieving the cases via condition. Options are ‘exact’ or ‘similar’. If not specified, “exact” will be used.
- Returns:
The trainee’s cases.
- Return type:
DataFrame
- get_contribution_matrix(features=None, *, directional=False, robust=True, targeted=False, normalize=True, fill_diagonal=True, fill_diagonal_value=1)#
Gets the Feature Contribution matrix.
- Parameters:
features (
Iterable
[str
] |None
, default:None
) – An iterable of feature names. If features are not provided, then the default trainee features will be used.directional (
bool
, default:False
) – Whether to get the matrix for the directional feature contributions or the absolute feature contributions.robust (
bool
, default:True
) – Whether to use robust calculations.targeted (
bool
, default:False
) – Whether to do a targeted re-analyze before each feature’s contribution is calculated.normalize (
bool
, default:True
) –- Whether to normalize the matrix row wise. If True, normalizes each row by dividing each value
by the sum of the values in the row, so the fractional values sum to 1.
fill_diagonal (
bool
, default:True
) – Whether to fill in the diagonals of the matrix. If set to true, the diagonal values will be filled in based on thefill_diagonal_value
value.fill_diagonal_value (
float
|int
, default:1
) – The value to fill in the diagonals with.fill_diagonal
must be set to True in order for the diagonal values to be filled in. If `fill_diagonal is set to false, then this parameter will be ignored.
- Returns:
The Feature Contribution matrix in a DataFrame.
- Return type:
DataFrame
- get_distances(features=None, *, use_case_weights=None, action_feature=None, case_indices=None, feature_values=None, weight_feature=None)#
Computes distances matrix for specified cases.
Returns a dict with computed distances between all cases specified in
case_indices
or from all cases in local model as defined byfeature_values
.- Parameters:
features (
Collection
[str
] |None
, default:None
) – List of feature names to use when computing distances. If unspecified uses all features.action_feature (
str
|None
, default:None
) – The action feature. If specified, uses targeted hyperparameters used to predict thisaction_feature
, otherwise uses targetless hyperparameters.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – List of tuples, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns distances for all of these cases. Ignored iffeature_values
is provided. If neitherfeature_values
norcase_indices
is specified, uses full dataset.feature_values (
Collection
[Any
] |DataFrame
|None
, default:None
) – If specified, returns distances of the local model relative to these values, ignorescase_indices
parameter. If provided a DataFrame, only the first row will be used.use_case_weights (
bool
|None
, default:None
) – If set to True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
A dict containing a matrix of computed distances and the list of corresponding case indices in the following format:
{ 'case_indices': [ session-indices ], 'distances': DataFrame( distances ) }
- Return type:
Distances
- get_extreme_cases(*, features=None, num, sort_feature)#
Get the trainee’s extreme cases.
- Parameters:
features (
Collection
[str
] |None
, default:None
) – The features to include in the case data.num (
int
) – The number of cases to get.sort_feature (
str
) – The name of the feature by which extreme cases are sorted.
- Returns:
The trainee’s extreme cases.
- Return type:
DataFrame
- get_feature_conviction(*, familiarity_conviction_addition=True, familiarity_conviction_removal=False, use_case_weights=None, action_features=None, features=None, weight_feature=None)#
Get familiarity conviction for features in the model.
- Parameters:
action_features (
Collection
[str
] |None
, default:None
) – The feature names to be treated as action features during conviction calculation in order to determine the conviction of each feature against the set of action_features. If not specified, conviction is computed for each feature against the rest of the features as a whole.familiarity_conviction_addition (
bool
, default:True
) – Calculate and output familiarity conviction of adding the specified cases.familiarity_conviction_removal (
bool
, default:False
) – Calculate and output familiarity conviction of removing the specified cases.features (
Collection
[str
] |None
, default:None
) – The feature names to calculate convictions for. At least 2 features are required to get familiarity conviction. If not specified all features will be used.use_case_weights (
bool
|None
, default:None
) – When True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
A DataFrame containing the familiarity conviction rows to feature columns.
- Return type:
DataFrame | dict
- get_marginal_stats(*, condition=None, num_cases=None, precision=None, weight_feature=None)#
Get marginal stats for all features.
- Parameters:
condition (
Mapping
[str
,Any
] |None
, default:None
) –A condition map to select which cases to compute marginal stats for.
Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
num_cases (
int
|None
, default:None
) – The maximum amount of cases to use to calculate marginal stats. If not specified, the limit will be k cases if precision is “similar”. Only used ifcondition
is not None.precision (
Literal
['exact'
,'similar'
] |None
, default:None
) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used ifcondition
is not None.weight_feature (
str
|None
, default:None
) – When specified, will attempt to return stats that were computed using this weight_feature.
- Returns:
A DataFrame of feature name columns to stat value rows. Indexed by the stat type. The return type depends on the underlying client.
- Return type:
DataFrame
- get_mda_matrix(features=None, *, robust=True, targeted=False, normalize=False, normalize_method='relative', absolute=False, fill_diagonal=True, fill_diagonal_value=1)#
Gets the Mean Decrease in Accuracy (MDA) matrix.
- Parameters:
features (
Iterable
[str
] |None
, default:None
) – An iterable of feature names. If features are not provided, then the default trainee features will be used.robust (
bool
, default:True
) – Whether to use robust calculations.targeted (
bool
, default:False
) – Whether to do a targeted re-analyze before each feature’s contribution is calculated.normalize (
bool
, default:False
) – Whether to normalize the matrix row wise. Normalization method is set by thenormalize_method
parameter.normalize_method (
Literal
['fractional_absolute'
,'fractional'
,'relative'
] |Callable
|Iterable
[Literal
['fractional_absolute'
,'fractional'
,'relative'
] |Callable
], default:'relative'
) –The normalization method. The method may either one of the strings below that correspond to a default method or a custom callable.
These methods may be passed in as an individual string or in a iterable where they will be processed sequentially.
Default Methods: - ‘relative’: normalizes each row by dividing each value by the maximum absolute value in the row. - ‘fractional’: normalizes each row by dividing each value by the sum of the values in the row, so the relative values sum to 1. - ‘fractional_absolute’: normalizes each row by dividing each value by the sum of absolute values in the row.
Custom Callable: - If a custom Callable is provided, then it will be passed onto the DataFrame apply function:
matrix.apply(Callable)
absolute (
bool
, default:False
) – Whether to transform the matrix values into the absolute values.fill_diagonal (
bool
, default:True
) – Whether to fill in the diagonals of the matrix. If set to true, the diagonal values will be filled in based on thefill_diagonal_value
value.fill_diagonal_value (
float
|int
, default:1
) – The value to fill in the diagonals with.fill_diagonal
must be set to True in order for the diagonal values to be filled in. If `fill_diagonal is set to false, then this parameter will be ignored.
- Returns:
The MDA matrix in a DataFrame.
- Return type:
DataFrame
- get_num_training_cases()#
Return the number of trained cases for the trainee.
- Returns:
The number of trained cases.
- Return type:
int
- get_pairwise_distances(features=None, *, use_case_weights=None, action_feature=None, from_case_indices=None, from_values=None, to_case_indices=None, to_values=None, weight_feature=None)#
Computes pairwise distances between specified cases.
Returns a list of computed distances between each respective pair of cases specified in either
from_values
orfrom_case_indices
toto_values
orto_case_indices
. If only one case is specified in any of the lists, all respective distances are computed to/from that one case.Note
One of
from_values
orfrom_case_indices
must be specified, not both.One of
to_values
orto_case_indices
must be specified, not both.
- Parameters:
features (
Collection
[str
] |None
, default:None
) – List of feature names to use when computing pairwise distances. If unspecified uses all features.action_feature (
str
|None
, default:None
) – The action feature. If specified, uses targeted hyperparameters used to predict thisaction_feature
, otherwise uses targetless hyperparameters.from_case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length ofto_values
orto_case_indices
.from_values (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – A 2d-list of case values. If specified must be either length of 1 or match length ofto_values
orto_case_indices
.to_case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length offrom_values
orfrom_case_indices
.to_values (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – A 2d-list of case values. If specified must be either length of 1 or match length offrom_values
orfrom_case_indices
.use_case_weights (
bool
|None
, default:None
) – If set to True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
A list of computed pairwise distances between each corresponding pair of cases in
from_case_indices
andto_case_indices
.- Return type:
list[float]
- get_params(*, action_feature=None, context_features=None, mode=None, weight_feature=None)#
Get the parameters used by the Trainee.
If
action_feature
,context_features
,mode
, orweight_feature
are specified, then the best hyperparameters analyzed in the Trainee are the value of the “hyperparameter_map” key, otherwise this value will be the dictionary containing all the hyperparameter sets in the Trainee.- Parameters:
action_feature (
str
|None
, default:None
) – If specified will return the best analyzed hyperparameters to target this feature.context_features (
Collection
[str
] |None
, default:None
) – If specified, will find and return the best analyzed hyperparameters to use with these context features.mode (
Literal
['robust'
,'full'
] |None
, default:None
) – If specified, will find and return the best analyzed hyperparameters that were computed in this mode.weight_feature (
str
|None
, default:None
) – If specified, will find and return the best analyzed hyperparameters that were analyzed using this weight feaure.
- Returns:
A dict including the either all of the Trainee’s internal parameters or only the best hyperparameters selected using the passed parameters.
- Return type:
dict[str, Any]
- get_runtime()#
The runtime details of the Trainee.
- Returns:
The Trainee runtime details. Including Trainee version and configuration parameters.
- Return type:
TraineeRuntime
- get_session_indices(session)#
Get all session indices for a specified session.
- Parameters:
session (
str
|Session
) – The id or instance of the session to retrieve indices for from the model.- Returns:
An index of the session indices for the requested session.
- Return type:
Index
- get_session_training_indices(session)#
Get all session training indices for a specified session.
- Parameters:
session (
str
|Session
) – The id or instance of the session to retrieve training indices for from the model.- Returns:
An index of the session training indices for the requested session.
- Return type:
Index
- get_sessions()#
Get all session ids of the trainee.
- Returns:
A list of dicts with keys “id” and “name” for each session in the model.
- Return type:
list[dict[str, str]]
- get_substitute_feature_values(*, clear_on_get=True)#
Get a substitution map for use in extended nominal generation.
- Parameters:
clear_on_get (
bool
, default:True
) – Clears the substitution values map in the trainee upon retrieving them. This is done if it is desired to prevent the substitution map from being persisted. If set to False, the model will not be cleared which preserves substitution mappings if the model is saved; representing a potential privacy leak should the substitution map be made public.- Returns:
A dictionary of feature name to a dictionary of feature value to substitute feature value.
- Return type:
dict[str, dict[str, Any]]
- impute(*, batch_size=1, features=None, features_to_impute=None)#
Impute (fill) the missing values for the specified features_to_impute.
If no
features
are specified, will use all features in the trainee for imputation. If nofeatures_to_impute
are specified, will impute all features specified byfeatures
.- Parameters:
batch_size (
int
, default:1
) –Larger batch size will increase speed but decrease accuracy. Batch size indicates how many rows to fill before recomputing conviction.
The default value (which is 1) should return the best accuracy but might be slower. Higher values should improve performance but may decrease accuracy of results.
features (
Collection
[str
] |None
, default:None
) – A list of feature names to use for imputation. If not specified, all features will be used.features_to_impute (
Collection
[str
] |None
, default:None
) – A list of feature names to impute. If not specified, features will be used.
- information()#
The runtime details of the Trainee.
Deprecated: Use trainee.get_runtime() instead.
- Return type:
TraineeRuntime
- persist()#
Persist the trainee.
- Return type:
None
- predict(contexts=None, action_features=None, *, allow_nulls=False, case_indices=None, context_features=None, derived_action_features=None, derived_context_features=None, leave_case_out=False, suppress_warning=False, use_case_weights=None, weight_feature=None)#
Wrapper around
react()
.Performs a discriminative react to predict the action feature values based on the given contexts. Returns only the predicted action values.
- Parameters:
contexts (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – The context values to react to. If neither this norcontext_values
are specified thencase_indices
must be specified.action_features (
Collection
[str
] |None
, default:None
) – Feature names to treat as action features during react.allow_nulls (
bool
, default:False
) – See parameterallow_nulls
inreact()
.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – Case indices to react to in lieu ofcontexts
orcontext_values
. If these are not specified, one ofcontexts
orcontext_values
must be specified.context_features (
Collection
[str
] |None
, default:None
) – Feature names to treat as context features during react. If nocontext_features
are specified, then this will be all of thefeatures
excluding theaction_features
.derived_action_features (
Collection
[str
] |None
, default:None
) – See parameterderived_action_features
inreact()
.derived_context_features (
Collection
[str
] |None
, default:None
) – See parameterderived_context_features
inreact()
.leave_case_out (
bool
, default:False
) – See parameterleave_case_out
inreact()
.suppress_warning (
bool
, default:False
) – See parametersuppress_warning
inreact()
.use_case_weights (
bool
|None
, default:None
) – See parameteruse_case_weights
inreact()
.weight_feature (
str
|None
, default:None
) – See parameterweight_feature
inreact()
.
- Returns:
DataFrame consisting of the discriminative predicted results.
- Return type:
DataFrame
- react(contexts=None, *, action_features=None, actions=None, allow_nulls=False, batch_size=None, case_indices=None, context_features=None, derived_action_features=None, derived_context_features=None, post_process_features=None, post_process_values=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, feature_bounds_map=None, generate_new_cases='no', initial_batch_size=None, input_is_substituted=False, into_series_store=None, leave_case_out=False, new_case_threshold='min', num_cases_to_generate=1, ordered_by_specified_features=False, preserve_feature_values=None, progress_callback=None, substitute_output=True, suppress_warning=False, use_case_weights=None, use_regional_model_residuals=True, weight_feature=None)#
React to the provided contexts.
If
desired_conviction
is specified, executes a generative react, producingaction_values
for the specifiedaction_features
conditioned on the optionally providedcontexts
.If
desired_conviction
is not specified, executes a discriminative react. Provided a list ofcontexts
, the trainee reacts to the model and produces predictions for the specified actions.- Parameters:
contexts (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – The context values to react to.action_features (
Collection
[str
] |None
, default:None
) – Feature names to treat as action features during react.actions (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – One or more action values to use for action features. If specified, will only return the specified explanation details for the given actions. (Discriminative reacts only)allow_nulls (
bool
, default:False
) – When true will allow return of null values if there are nulls in the local model for the action features, applicable only to discriminative reacts.batch_size (
int
|None
, default:None
) – Define the number of cases to react to at once. If left unspecified, the batch size will be determined automatically.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null, generative react ignores it.context_features (
Collection
[str
] |None
, default:None
) – Feature names to treat as context features during react.derived_action_features (
Collection
[str
] |None
, default:None
) –Features whose values should be computed after reaction from the resulting case prior to output, in the specified order. Must be a subset of
action_features
.Note
Both of these derived feature lists rely on the features’ “derived_feature_code” attribute to compute the values. If the “derived_feature_code” attribute is undefined or references a non-0 feature indices, the derived value will be null.
derived_context_features (
Collection
[str
] |None
, default:None
) – Features whose values should be computed from the provided context in the specified order.post_process_features (
Collection
[str
] |None
, default:None
) – List of feature names that will be made available during the execution of post_process feature attributes.post_process_values (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – A 2d list of values corresponding to post_process_features that will be made available during the execution of post_process feature attributes.desired_conviction (
float
|None
, default:None
) – If specified will execute a generative react. If not specified will execute a discriminative react. Conviction is the ratio of expected surprisal to generated surprisal for each feature generated, valid values are in the range of \((0,\infty]\).details (
Mapping
[str
,Any
] |None
, default:None
) –If details are specified, the response will contain the requested explanation data along with the reaction. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the audit data returned.
- boundary_casesbool, optional
If True, outputs an automatically determined (when ‘num_boundary_cases’ is not specified) relevant number of boundary cases. Uses both context and action features of the reacted case to determine the counterfactual boundary based on action features, which maximize the dissimilarity of action features while maximizing the similarity of context features. If action features aren’t specified, uses familiarity conviction to determine the boundary instead.
- boundary_cases_familiarity_convictionsbool, optional
If True, outputs familiarity conviction of addition for each of the boundary cases.
- case_contributions_fullbool, optional
If true outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out for cases for computations.
- case_contributions_robustbool, optional
If true outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Uses robust calculations, which uses uniform sampling from the power set of all combinations of cases.
- case_feature_residuals_fullbool, optional
If True, outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Uses full calculations, which uses leave-one-out for cases for computations.
- case_feature_residuals_robustbool, optional
If True, outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions.
- case_mda_robustbool, optional
If True, outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Uses robust calculations, which uses uniform sampling from the power set of all combinations of cases.
- case_mda_fullbool, optional
If True, outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out for cases for computations.
- categorical_action_probabilitiesbool, optional
If True, outputs probabilities for each class for the action. Applicable only to categorical action features.
- derivation_parametersbool, optional
If True, outputs a dictionary of the parameters used in the react call. These include k, p, distance_transform, feature_weights, feature_deviations, nominal_class_counts, and use_irw.
k: the number of cases used for the local model.
p: the parameter for the Lebesgue space.
distance_transform: the distance transform used as an exponent to convert distances to raw influence weights.
feature_weights: the weight for each feature used in the distance metric.
feature_deviations: the deviation for each feature used in the distance metric.
nominal_class_counts: the number of unique values for each nominal feature. This is used in the distance metric.
use_irw: a flag indicating if feature weights were derived using inverse residual weighting.
- distance_contributionbool, optional
If True, outputs the distance contribution (expected total surprisal contribution) for the reacted case. Uses both context and action feature values.
- distance_ratiobool, optional
If True, outputs the ratio of distance (relative surprisal) between this reacted case and its nearest case to the minimum distance (relative surprisal) in between the closest two cases in the local area. All distances are computed using only the specified context features.
- feature_contributions_robustbool, optional
If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions. Directional feature contributions are returned under the key ‘directional_feature_contributions_robust’.
- feature_contributions_fullbool, optional
If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area. Uses full calculations, which uses leave-one-out for cases for computations. Directional feature contributions are returned under the key ‘directional_feature_contributions_full’.
- case_feature_contributions_robust: bool, optional
If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context feature were not in the model for all context features in this case, using only the values from this specific case. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions. Directional case feature contributions are returned under the ‘case_directional_feature_contributions_robust’ key.
- case_feature_contributions_full: bool, optional
If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context feature were not in the model for all context features in this case, using only the values from this specific case. Uses full calculations, which uses leave-one-out for cases for computations. Directional case feature contributions are returned under the ‘case_directional_feature_contributions_full’ key.
- feature_mda_robustbool, optional
If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions.
- feature_mda_fullbool, optional
If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out for cases for computations.
- feature_mda_ex_post_robustbool, optional
If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation detail given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions.
- feature_mda_ex_post_fullbool, optional
If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation detail given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out for cases for computations.
- featureslist of str, optional
A list of feature names that specifies for what features will per-feature details be computed (residuals, contributions, mda, etc.). This should generally preserve compute, but will not when computing details robustly. Details will be computed for all context and action features if this value is not specified.
- feature_residual_robustbool, optional
If True, outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions.
- feature_residuals_fullbool, optional
If True, outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out for cases for computations.
- hypothetical_valuesdict, optional
A dictionary of feature name to feature value. If specified, shows how a prediction could change in a what-if scenario where the influential cases’ context feature values are replaced with the specified values. Iterates over all influential cases, predicting the action features each one using the updated hypothetical values. Outputs the predicted arithmetic over the influential cases for each action feature.
- influential_casesbool, optional
If True, outputs the most influential cases and their influence weights based on the surprisal of each case relative to the context being predicted among the cases. Uses only the context features of the reacted case.
- influential_cases_familiarity_convictionsbool, optional
If True, outputs familiarity conviction of addition for each of the influential cases.
- influential_cases_raw_weightsbool, optional
If True, outputs the surprisal for each of the influential cases.
- case_feature_residual_convictions_robustbool, optional
If True, outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Uses robust calculations, which uses uniform sampling from the power set of features as the contexts for predictions.
- case_feature_residual_convictions_fullbool, optional
If True, outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Uses full calculations, which uses leave-one-out for cases for computations.
- most_similar_casesbool, optional
If True, outputs an automatically determined (when ‘num_most_similar_cases’ is not specified) relevant number of similar cases, which will first include the influential cases. Uses only the context features of the reacted case.
- num_boundary_casesint, optional
Outputs this manually specified number of boundary cases.
- num_most_similar_casesint, optional
Outputs this manually specified number of most similar cases, which will first include the influential cases.
- num_most_similar_case_indicesint, optional
Outputs this specified number of most similar case indices when ‘distance_ratio’ is also set to True.
- num_robust_influence_samples_per_caseint, optional
Specifies the number of robust samples to use for each case. Applicable only for computing robust feature contributions or robust case feature contributions. Defaults to 2000. Higher values will take longer but provide more stable results.
- observational_errorsbool, optional
If True, outputs observational errors for all features as defined in feature attributes.
- outlying_feature_valuesbool, optional
If True, outputs the reacted case’s context feature values that are outside the min or max of the corresponding feature values of all the cases in the local model area. Uses only the context features of the reacted case to determine that area.
- prediction_statsbool, optional
When true outputs feature prediction stats for all (context and action) features locally around the prediction. The stats returned are (“r2”, “rmse”, “spearman_coeff”, “precision”, “recall”, “accuracy”, “mcc”, “confusion_matrix”, “missing_value_accuracy”). Uses only the context features of the reacted case to determine that area. Uses full calculations, which uses leave-one-out context features for computations.
- selected_prediction_statslist, optional. List of stats to output. When unspecified,
returns all except the confusion matrix. Allowed values:
all : Returns all the the available prediction stats, including the confusion matrix.
accuracy : The number of correct predictions divided by the total number of predictions.
confusion_matrix : A sparse map of actual feature value to a map of predicted feature value to counts.
mae : Mean absolute error. For continuous features, this is calculated as the mean of absolute values of the difference between the actual and predicted values. For nominal features, this is 1 - the average categorical action probability of each case’s correct classes. Categorical action probabilities are the probabilities for each class for the action feature.
mda : Mean decrease in accuracy when each feature is dropped from the model, applies to all features.
feature_mda_permutation_full : Mean decrease in accuracy that used scrambling of feature values instead of dropping each feature, applies to all features.
precision : Precision (positive predictive) value for nominal features only.
r2 : The r-squared coefficient of determination, for continuous features only.
recall : Recall (sensitivity) value for nominal features only.
rmse : Root mean squared error, for continuous features only.
spearman_coeff : Spearman’s rank correlation coefficient, for continuous features only.
mcc : Matthews correlation coefficient, for nominal features only.
- similarity_convictionbool, optional
If True, outputs similarity conviction for the reacted case. Uses both context and action feature values as the case values for all computations. This is defined as expected (local) distance contribution divided by reacted case distance contribution.
- generate_attemptsbool, optional
If True outputs the number of attempts taken to generate each case. Only applicable when ‘generate_new_cases’ is “always” or “attempt”.
exclude_novel_nominals_from_uniqueness_check (
bool
, default:False
) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens whengenerate_new_cases
is True. Only applies to generative reacts.feature_bounds_map (
Mapping
[str
,Mapping
[str
,Any
]] |None
, default:None
) –A mapping of feature names to the bounds for the feature values to be generated in. For continuous features this should be a numeric value, for datetimes this should be a datetime string or a numeric epoch value. Min bounds should be equal to or smaller than max bounds, except when setting the bounds around the cycle length of a cyclic feature. (e.g., to allow 0 +/- 60 degrees, set min=300 and max=60).
Example:
{ "feature_a": {"min": 0}, "feature_b" : {"min": 1, "max": 5}, "feature_c": {"max": 1} }
generate_new_cases (
Literal
['always'
,'attempt'
,'no'
], default:'no'
) –This parameter takes in a string that may be one of the following:
attempt:
Synthesizer
attempts to generate new cases and if its not possible to generate a new case, it might generate cases in “no” mode (see point c.)always:
Synthesizer
always generates new cases and if its not possible to generate a new case, it returnsNone
.no:
Synthesizer
generates data based on thedesired_conviction
specified and the generated data is not guaranteed to be a new case (that is, a case not found in original dataset.)
initial_batch_size (
int
|None
, default:None
) – Define the number of cases to react to in the first batch. If unspecified, a default defined by thereact_initial_batch_size
property of the selected client will be used. The number of cases in following batches will be automatically adjusted. This value is ignored ifbatch_size
is specified.input_is_substituted (
bool
, default:False
) – When True, assumes provided categorical (nominal or ordinal) feature values have already been substituted.into_series_store (
str
|None
, default:None
) – The name of a series store. If specified, will store an internal record of all react contexts for this session and series to be used later with train series.leave_case_out (
bool
, default:False
) – When True and specified along withcase_indices
, each individual react will respectively ignore the corresponding case specified bycase_indices
by leaving it out.new_case_threshold (
Literal
['max'
,'min'
,'most_similar'
], default:'min'
) –Distance to determine the privacy cutoff.
Possible values:
min: minimum distance in the original local space.
max: maximum distance in the original local space.
most_similar: distance between the nearest neighbor to the nearest neighbor in the original space.
num_cases_to_generate (
int
, default:1
) – The number of cases to generate.ordered_by_specified_features (
bool
, default:False
) – When True, the order of generated feature values will match the order of specified features.preserve_feature_values (
Collection
[str
] |None
, default:None
) – Features that will preserve their values from the case specified bycase_indices
, appending and overwriting the specified contexts as necessary. For generative reacts, ifcase_indices
isn’t specified will preserve feature values of a random case.progress_callback (
Callable
|None
, default:None
) – A callback method that will be called before each batched call to react and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react operation, and the batch result.substitute_output (
bool
, default:True
) – When False, will not substitute categorical feature values. Only applicable if a substitution value map has been set.suppress_warning (
bool
, default:False
) – When True, warnings will not be displayed.use_case_weights (
bool
|None
, default:None
) – When True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.use_regional_model_residuals (
bool
, default:True
) – When false, uses model feature residuals. When True, recalculates regional model residuals.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
- A MutableMapping (dict-like) with these keys -> values:
- action -> DataFrame
A data frame of action values.
- details -> dict or list
An aggregated list of any requested details.
- Return type:
Reaction
- react_aggregate(*, action_feature=None, action_features=None, confusion_matrix_min_count=None, context_features=None, details=None, feature_influences_action_feature=None, hyperparameter_param_path=None, num_robust_influence_samples=None, num_robust_residual_samples=None, num_robust_influence_samples_per_case=None, num_samples=None, prediction_stats_action_feature=None, residuals_hyperparameter_feature=None, robust_hyperparameters=None, sample_model_fraction=None, sub_model_size=None, use_case_weights=None, weight_feature=None)#
Reacts into the aggregate trained cases in the Trainee.
Calculates, caches, and/or returns the requested influences and prediction stats.
- Parameters:
action_feature (
str
|None
, default:None
) – Name of target feature for which to do computations. Ifprediction_stats_action_feature
andfeature_influences_action_feature
are not provided, they will default to this value. Iffeature_influences_action_feature
is not provided and feature influencesdetails
are selected, this feature must be provided.action_features (
Collection
[str
] |None
, default:None
) – List of feature names to compute any requested residuals or prediction statistics for. If unspecified, the value used for context features will be used.confusion_matrix_min_count (
int
|None
, default:None
) – The number of predictions a class should have (value of a cell in the matrix) for it to remain in the confusion matrix. If the count is less than this value, it will be accumulated into a single value of all insignificant predictions for the class and removed from the confusion matrix. Defaults to 10, applicable only to confusion matrices when computing residuals.context_features (
Collection
[str
] |None
, default:None
) – List of features names to use as contexts for computations. Default is all trained non-unique features if unspecified.details (
dict
|None
, default:None
) –If details are specified, the response will contain the requested explanation data.. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the data returned.
- prediction_statsbool, optional
If True outputs full feature prediction stats for all (context and action) features. The prediction stats returned are set by the “selected_prediction_stats” parameter in the details parameter. Uses full calculations, which uses leave-one-out for features for computations.
- feature_residuals_fullbool, optional
For each context_feature, use the full set of all other context_features to predict the feature. When
prediction_stats
in thedetails
parameter is true, the Trainee will also calculate the full feature residuals.
- feature_residuals_robustbool, optional
For each context_feature, use the robust (power set/permutations) set of all other context_features to predict the feature.
- feature_contributions_fullbool, optional
For each context_feature, use the full set of all other context_features to compute the mean absolute delta between prediction of action feature with and without the context features in the model. Returns the mean absolute delta under the key ‘feature_contributions_full’ and returns the mean delta under the key ‘directional_feature_contributions_full’.
- feature_contributions_robustbool, optional
For each context_feature, use the robust (power set/permutation) set of all other context_features to compute the mean absolute delta between prediction of the action feature with and without the context features in the model. Returns the mean absolute delta under the key ‘feature_contributions_robust’ and returns the mean delta under the key ‘directional_feature_contributions_robust’.
- feature_mda_fullbool, optional
When True will compute Mean Decrease in Accuracy (MDA) for each context feature at predicting the action feature. Drop each feature and use the full set of remaining context features for each prediction.
- feature_mda_robustbool, optional
Compute Mean Decrease in Accuracy MDA by dropping each feature and using the robust (power set/permutations) set of remaining context features for each prediction.
- feature_feature_mda_permutation_fullbool, optional
Compute MDA by scrambling each feature and using the full set of remaining context features for each prediction.
- feature_feature_mda_permutation_robustbool, optional
Compute MDA by scrambling each feature and using the robust (power set/permutations) set of remaining context features for each prediction.
- action_conditionmap of str -> any, optional
A condition map to select the action set, which is the dataset for which the prediction stats are for. If both
action_condition
andcontext_condition
are provided, then all of the action cases selected by theaction_condition
will be excluded from the context set, which is the set being queried to make to make predictions on the action set, effectively holding them out. If onlyaction_condition
is specified, then only the single predicted case will be left out.Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive
range. Only applicable to continuous and numeric ordinal features. - An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
- action_num_casesint, optional
The maximum amount of cases to use to calculate prediction stats. If not specified, the limit will be k cases if precision is “similar”, or 1000 cases if precision is “exact”. Works with or without
action_condition
. -Ifaction_condition
is set:If None, will be set to k if precision is “similar” or no limit if precision is “exact”.
- If
action_condition
is not set: If None, will be set to the Howso default limit of 2000.
- If
- action_condition_precision{“exact”, “similar”}, optional
The precision to use when selecting cases with the
action_condition
. If not specified “exact” will be used. Only used ifaction_condition
is not None.
- context_conditionmap of str -> any, optional
A condition map to select the context set, which is the set being queried to make to make predictions on the action set. If both
action_condition
andcontext_condition
are provided, then all of the cases from the action set, which is the dataset for which the prediction stats are for, will be excluded from the context set, effectively holding them out. If onlyaction_condition
is specified, then only the single predicted case will be left out.Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive
range. Only applicable to continuous and numeric ordinal features. - An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
- context_precision_num_casesint, optional
Limit on the number of context cases when
context_condition_precision
is set to “similar”. If None, will be set to k.
- context_condition_precision{“exact”, “similar”}, optional
The precision to use when selecting cases with the
context_condition
. If not specified “exact” will be used. Only used ifcontext_condition
is not None.
- prediction_stats_featureslist, optional
List of features to use when calculating conditional prediction stats. Should contain all action and context features desired. If
action_feature
is also provided, that feature will automatically be appended to this list if it is not already in the list.stats : list of str, optional
- selected_prediction_statslist, optional
List of stats to output. When unspecified, returns all except the confusion matrix. Allowed values:
all : Returns all the the available prediction stats, including the confusion matrix.
accuracy : The number of correct predictions divided by the total number of predictions.
confusion_matrix : A sparse map of actual feature value to a map of predicted feature value to counts.
mae : Mean absolute error. For continuous features, this is calculated as the mean of absolute values of the difference between the actual and predicted values. For nominal features, this is 1 - the average categorical action probability of each case’s correct classes. Categorical action probabilities are the probabilities for each class for the action feature.
mda : Mean decrease in accuracy when each feature is dropped from the model, applies to all features.
feature_mda_permutation_full : Mean decrease in accuracy that used scrambling of feature values instead of dropping each feature, applies to all features.
precision : Precision (positive predictive) value for nominal features only.
r2 : The r-squared coefficient of determination, for continuous features only.
recall : Recall (sensitivity) value for nominal features only.
rmse : Root mean squared error, for continuous features only.
spearman_coeff : Spearman’s rank correlation coefficient, for continuous features only.
mcc : Matthews correlation coefficient, for nominal features only.
feature_influences_action_feature (
str
|None
, default:None
) – When feature influences such as contributions and mda, use this feature as the action feature. If not provided, will default to theaction_feature
if provided. Ifaction_feature
is not provided and feature influencesdetails
are selected, this feature must be provided.hyperparameter_param_path (
Collection
[str
] |None
, default:None
) – Full path for hyperparameters to use for computation. If specified for any residual computations, takes precedence over action_feature parameter. Can be set to a ‘paramPath’ value from the results of ‘get_params()’ for a specific set of hyperparameters.num_robust_influence_samples (
int
|None
, default:None
) – Total sample size of model to use (using sampling with replacement) for robust contribution computation. Defaults to 300.num_robust_residual_samples (
int
|None
, default:None
) – Total sample size of model to use (using sampling with replacement) for robust mda and residual computation. Defaults to 1000 * (1 + log(number of features)). Note: robust mda will be updated to use num_robust_influence_samples in a future release.num_robust_influence_samples_per_case (
int
|None
, default:None
) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).num_samples (
int
|None
, default:None
) – Total sample size of model to use (using sampling with replacement) for all non-robust computation. Defaults to 1000. If specified overrides sample_model_fraction.```residuals_hyperparameter_feature (
str
|None
, default:None
) – When calculating residuals and prediction stats, uses this target features’s hyperparameters. The trainee must have been analyzed with this feature as the action feature first. If not provided, by default residuals and prediction stats uses “.targetless” hyperparameters.robust_hyperparameters (
bool
|None
, default:None
) – When specified, will attempt to return residuals that were computed using hyperparameters with the specified robust or non-robust type.prediction_stats_action_feature (
str
|None
, default:None
) – When calculating residuals and prediction stats, uses this target features’s hyperparameters. The trainee must have been analyzed with this feature as the action feature first. If bothprediction_stats_action_feature
andaction_feature
are not provided, by default residuals and prediction stats uses “.targetless” hyperparameters. If “action_feature” is provided, and this value is not provided, will default toaction_feature
.sample_model_fraction (
float
|None
, default:None
) – A value between 0.0 - 1.0, percent of model to use in sampling (using sampling without replacement). Applicable only to non-robust computation. Ignored if num_samples is specified. Higher values provide better accuracy at the cost of compute time.sub_model_size (
int
|None
, default:None
) – Subset of model to use for calculations. Applicable only to models > 1000 cases.use_case_weights (
bool
|None
, default:None
) – If set to True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – The name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
If specified, a DataFrame of feature name columns to stat value rows. Indexed by the stat or detail type. The return type depends on the underlying client.
- Return type:
DataFrame
- react_group(new_cases, *, distance_contributions=False, familiarity_conviction_addition=True, familiarity_conviction_removal=False, kl_divergence_addition=False, kl_divergence_removal=False, p_value_of_addition=False, p_value_of_removal=False, use_case_weights=None, features=None, weight_feature=None)#
Computes specified data for a set of cases.
Return the list of familiarity convictions (and optionally, distance contributions or \(p\) values) for each set.
- Parameters:
distance_contributions (
bool
, default:False
) – Calculate and output distance contribution ratios in the output dict for each case.familiarity_conviction_addition (
bool
, default:True
) – Calculate and output familiarity conviction of adding the specified cases.familiarity_conviction_removal (
bool
, default:False
) – Calculate and output familiarity conviction of removing the specified cases.features (
Collection
[str
] |None
, default:None
) – A list of feature names to consider while calculating convictions.kl_divergence_addition (
bool
, default:False
) – Calculate and output KL divergence of adding the specified cases.kl_divergence_removal (
bool
, default:False
) – Calculate and output KL divergence of removing the specified cases.new_cases (
list
[DataFrame
] |list
[list
[list
[Any
]]]) –Specify a set using a list of cases to compute the conviction of groups of cases as shown in the following example.
Example:
new_cases = [ [[1, 2, 3], [4, 5, 6], [7, 8, 9]], # Group 1 [[1, 2, 3]], # Group 2 ]
p_value_of_addition (
bool
, default:False
) – If true will output \(p\) value of addition.p_value_of_removal (
bool
, default:False
) – If true will output \(p\) value of removal.use_case_weights (
bool
|None
, default:None
) – When True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- Returns:
The conviction of grouped cases.
- Return type:
DataFrame | dict
- react_into_features(*, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, features=None, influence_weight_entropy=False, p_value_of_addition=False, p_value_of_removal=False, similarity_conviction=False, use_case_weights=None, weight_feature=None)#
Calculate conviction and other data and stores them into features.
- Parameters:
distance_contribution (
str
|bool
, default:False
) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.familiarity_conviction_addition (
str
|bool
, default:False
) – The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.familiarity_conviction_removal (
str
|bool
, default:False
) – The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.features (
Collection
[str
] |None
, default:None
) – A list of features to calculate convictions.influence_weight_entropy (
str
|bool
, default:False
) – The name of the feature to store influence weight entropy values in. If set to True, the values will be stored in the feature ‘influence_weight_entropy’.p_value_of_addition (
str
|bool
, default:False
) – The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.p_value_of_removal (
str
|bool
, default:False
) – The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.similarity_conviction (
str
|bool
, default:False
) – The name of the feature to store similarity conviction values. If set to True the values will be stored to the feature ‘similarity_conviction’.use_case_weights (
bool
|None
, default:None
) – When True, will scale influence weights by each case’sweight_feature
weight. If unspecified, case weights will be used if the Trainee has them.weight_feature (
str
|None
, default:None
) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.
- react_series(contexts=None, *, action_features=None, actions=None, batch_size=None, case_indices=None, context_features=None, continue_series=False, continue_series_features=None, continue_series_values=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, feature_bounds_map=None, final_time_steps=None, generate_new_cases='no', series_index='.series', init_time_steps=None, initial_batch_size=None, initial_features=None, initial_values=None, input_is_substituted=False, leave_case_out=False, max_series_lengths=None, new_case_threshold='min', num_series_to_generate=1, ordered_by_specified_features=False, output_new_series_ids=True, preserve_feature_values=None, progress_callback=None, series_context_features=None, series_context_values=None, series_id_tracking='fixed', series_stop_maps=None, substitute_output=True, suppress_warning=False, use_case_weights=None, use_regional_model_residuals=True, weight_feature=None)#
React to the trainee in a series until a stop condition is met.
Aggregates rows of data corresponding to the specified context, action, derived_context and derived_action features, utilizing previous rows to derive values as necessary. Outputs a dict of “action_features” and corresponding “action” where “action” is the completed ‘matrix’ for the corresponding action_features and derived_action_features.
- Parameters:
contexts (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – The context values to react to.action_features (
Collection
[str
] |None
, default:None
) – See parameteraction_features
inreact()
.actions (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – See parameteractions
inreact()
.batch_size (
int
|None
, default:None
) – Define the number of series to react to at once. If left unspecified, the batch size will be determined automatically.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – See parametercase_indices
inreact()
.context_features (
Collection
[str
] |None
, default:None
) – See parametercontext_features
inreact()
.continue_series (
bool
, default:False
) –When True will attempt to continue existing series instead of starting new series. If
initial_values
provide series IDs, it will continue those explicitly specified IDs, otherwise it will randomly select series to continue. .. note:Terminated series with terminators cannot be continued and will result in null output.
continue_series_features (
Collection
[str
] |None
, default:None
) – The list of feature names corresponding to the values in each row ofcontinue_series_values
. This value is ignored ifcontinue_series_values
is None.continue_series_values (
list
[DataFrame
] |list
[list
[list
[Any
]]] |None
, default:None
) – The set of series data to be forecasted with feature values in the same order defined bycontinue_series_values
. The value ofcontinue_series
will be ignored and treated as true if this value is specified.derived_action_features (
Collection
[str
] |None
, default:None
) – See parameterderived_action_features
inreact()
.derived_context_features (
Collection
[str
] |None
, default:None
) – See parameterderived_context_features
inreact()
.desired_conviction (
float
|None
, default:None
) – See parameterdesired_conviction
inreact()
.details (
Mapping
[str
,Any
] |None
, default:None
) – See parameterdetails
inreact()
.exclude_novel_nominals_from_uniqueness_check (
bool
, default:False
) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens whengenerate_new_cases
is True. Only applies to generative reacts.feature_bounds_map (
Mapping
[str
,Mapping
[str
,Any
]] |None
, default:None
) – See parameterfeature_bounds_map
inreact()
.final_time_steps (
list
[Any
] |None
, default:None
) – The time steps at which to end synthesis. Time-series only. Time-series only. Must provide either one for all series, or exactly one per series.generate_new_cases (
Literal
['always'
,'attempt'
,'no'
], default:'no'
) – See parametergenerate_new_cases
inreact()
.series_index (
str
, default:'.series'
) – When set to a string, will include the series index as a column in the returned DataFrame using the column name given. If set to None, no column will be added.init_time_steps (
list
[Any
] |None
, default:None
) – The time steps at which to begin synthesis. Time-series only. Time-series only. Must provide either one for all series, or exactly one per series.initial_batch_size (
int
|None
, default:None
) – The number of series to react to in the first batch. If unspecified, the number will be determined automatically by the client. The number of series in following batches will be automatically adjusted. This value is ignored ifbatch_size
is specified.initial_features (
Collection
[str
] |None
, default:None
) – Features to condition just the first case in a series, overwrites context_features and derived_context_features for that first case. All specified initial features must be in one of: context_features, action_features, derived_context_features or derived_action_features. If provided a value that isn’t in one of those lists, it will be ignored.initial_values (
DataFrame
|list
[list
[Any
]] |None
, default:None
) – Values corresponding to the initial_features, used to condition just the first case in each series. Must provide either exactly one value to use for all series, or one per series.input_is_substituted (
bool
, default:False
) – See parameterinput_is_substituted
inreact()
.leave_case_out (
bool
, default:False
) – See parameterleave_case_out
inreact()
.max_series_lengths (
list
[int
] |None
, default:None
) – maximum size a series is allowed to be. Default is 3 * model_size, a 0 or less is no limit. If forecasting withcontinue_series
, this defines the maximum length of the forecast. Must provide either one for all series, or exactly one per series.new_case_threshold (
Literal
['max'
,'min'
,'most_similar'
], default:'min'
) – See parameternew_case_threshold
inreact()
.num_series_to_generate (
int
, default:1
) – The number of series to generate.ordered_by_specified_features (
bool
, default:False
) – See parameterordered_by_specified_features
inreact()
.output_new_series_ids (
bool
, default:True
) – If True, series ids are replaced with unique values on output. If False, will maintain or replace ids with existing trained values, but also allows output of series with duplicate existing ids.preserve_feature_values (
list
[str
] |None
, default:None
) – See parameterpreserve_feature_values
inreact()
.progress_callback (
Callable
|None
, default:None
) – A callback method that will be called before each batched call to react series and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react series operation, and the batch result.series_context_features (
Collection
[str
] |None
, default:None
) – List of context features corresponding to series_context_values, if specified must not overlap with any initial_features or context_features.series_context_values (
list
[DataFrame
] |list
[list
[list
[Any
]]] |None
, default:None
) – 3d list of context values, one for each feature for each row for each series. If specified, batch_size and max_series_lengths are ignored.series_id_tracking (
Literal
['fixed'
,'dynamic'
,'no'
], default:'fixed'
) –Controls how closely generated series should follow existing series (plural).
If “fixed”, tracks the particular relevant series ID.
If “dynamic”, tracks the particular relevant series ID, but is allowed to change the series ID that it tracks based on its current context.
If “no”, does not track any particular series ID.
series_stop_maps (
list
[Mapping
[str
,Mapping
[str
,Any
]]] |None
, default:None
) –Map of series stop conditions. Must provide either exactly one to use for all series, or one per series.
Tip
Stop series when value exceeds max or is smaller than min:
{"feature_name": {"min" : 1, "max": 2}}
Stop series when feature value matches any of the values listed:
{"feature_name": {"values": ["val1", "val2"]}}
substitute_output (
bool
, default:True
) – See parametersubstitute_output
inreact()
.suppress_warning (
bool
, default:False
) – See parametersuppress_warning
inreact()
.use_case_weights (
bool
|None
, default:None
) – See parameteruse_case_weights
inreact()
.use_regional_model_residuals (
bool
, default:True
) – See parameteruse_regional_model_residuals
inreact()
.weight_feature (
str
|None
, default:None
) – See parameterweight_feature
inreact()
.
- Returns:
- A MutableMapping (dict-like) with these keys -> values:
- action -> DataFrame
A data frame of action values.
- details -> dict or list
An aggregated list of any requested details.
- Return type:
Reaction
- reduce_data(features=None, distribute_weight_feature=None, influence_weight_entropy_threshold=None, skip_auto_analyze=False, **kwargs)#
Smartly reduce the amount of trained cases while accumulating case weights.
Determines which cases to remove by comparing the influence weight entropy of each trained case to the
influence_weight_entropy_threshold
quantile of existing influence weight entropies.Note
All ablation endpoints, including
reduce_data()
are experimental and may have their API changed without deprecation.See also
The default
distribute_weight_feature
andinfluence_weight_entropy_threshold
are pulled from the auto-ablation parameters, which can be set or retrieved withset_auto_ablation_params()
andget_auto_ablation_params()
, respectively.- Parameters:
trainee_id – The ID of the Trainee for which to reduce data.
features (
Collection
[str
] |None
, default:None
) – The features which should be used to determine which cases to remove. This defaults to all of the trained features (excluding internal features).distribute_weight_feature (
str
|None
, default:None
) – The name of the weight feature to accumulate case weights to as cases are removed. This defaults to the value ofauto_ablation_weight_feature
fromset_auto_ablation_params()
, which defaults to “.case_weight”.influence_weight_entropy_threshold (
float
|None
, default:None
) – The quantile of influence weight entropy above which cases will be removed. This defaults to the value ofinfluence_weight_entropy_threshold
fromset_auto_ablation_params()
, which defaults to 0.6.skip_auto_analyze (
bool
, default:False
) – Whether to skip auto-analyzing as cases are removed.
- Returns:
A dictionary for reporting experimental outputs of reduce data. Currently, the default non-experimental output is an empty dictionary.
- Return type:
dict
- release_resources()#
Release a trainee’s resources from the Howso service.
- remove_cases(num_cases, *, case_indices=None, condition=None, condition_session=None, distribute_weight_feature=None, precision=None)#
Remove training cases from the trainee.
The training cases will be completely purged from the model and the model will behave as if it had never been trained with them.
- Parameters:
num_cases (
int
) – The number of cases to remove; minimum 1 case must be removed. Ignored if case_indices is specified.case_indices (
Sequence
[tuple
[str
,int
]] |None
, default:None
) – A list of tuples containing session ID and session training index for each case to be removed.condition (
Mapping
[str
,Any
] |None
, default:None
) –The condition map to select the cases to remove that meet all the provided conditions. Ignored if case_indices is specified.
Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
Tip
Example 1 - Remove all values belonging to
feature_name
:condition = {"feature_name": None}
Example 2 - Remove cases that have the value 10:
condition = {"feature_name": 10}
Example 3 - Remove cases that have a value in range [10, 20]:
condition = {"feature_name": [10, 20]}
Example 4 - Remove cases that match one of [‘a’, ‘c’, ‘e’]:
condition = {"feature_name": ['a', 'c', 'e']}
condition_session (
str
|Session
|None
, default:None
) – If specified, ignores the condition and operates on cases for the specified session id or Session instance. Ignored if case_indices is specified.distribute_weight_feature (
str
|None
, default:None
) – When specified, will distribute the removed cases’ weights from this feature into their neighbors.precision (
Literal
['exact'
,'similar'
] |None
, default:None
) – The precision to use when removing the cases.If not specified “exact” will be used. Ignored if case_indices is specified.
- Returns:
The number of cases removed.
- Return type:
int
- remove_feature(feature, *, condition=None, condition_session=None)#
Remove a feature from the trainee.
Updates the accumulated data mass for the model proportional to the number of cases modified.
- Parameters:
feature (
str
) – The name of the feature to remove.condition (
Mapping
[str
,Any
] |None
, default:None
) –A condition map where features will only be removed when certain criteria is met.
If None, the feature will be removed from all cases in the model and feature metadata will be updated to exclude it. If specified as an empty dict, the feature will still be removed from all cases in the model but the feature metadata will not be updated.
Note
The dictionary keys are the feature name and values are one of:
None
A value, must match exactly.
An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.
An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.
Tip
For instance to remove the
length
feature only when the value is between 1 and 5:condition = {"length": [1, 5]}
condition_session (
str
|Session
|None
, default:None
) – If specified, ignores the condition and operates on cases for the specified session id or Session instance.
- remove_series_store(series=None)#
Clear stored series from trainee.
- Parameters:
series (
str
|None
, default:None
) – Series id to clear. If not provided, clears the entire series store for the trainee.
- save(file_path=None)#
Save a Trainee to disk.
- Parameters:
file_path (
str
|PathLike
|None
, default:None
) – The path of the file to save the Trainee to. This path can contain an absolute path, a relative path or simply a file name. If no filepath is provided, the default filepath will be the CWD. Iffile_path
is a relative path (with or without a file name), the absolute path will be computed appending thefile_path
to the CWD. Iffile_path
is an absolute path, this is the absolute path that will be used. Iffile_path
does not contain a filename, then the natural trainee name will be used<uuid>.caml
.
- set_auto_ablation_params(auto_ablation_enabled=False, *, ablated_cases_distribution_batch_size=100, auto_ablation_weight_feature='.case_weight', batch_size=2000, conviction_lower_threshold=None, conviction_upper_threshold=None, exact_prediction_features=None, influence_weight_entropy_threshold=0.6, minimum_model_size=1000, relative_prediction_threshold_map=None, residual_prediction_features=None, tolerance_prediction_threshold_map=None, **kwargs)#
Set trainee parameters for auto-ablation.
Note
All ablation endpoints, including
set_auto_ablation_params()
are experimental and may have their API changed without deprecation.See also
The params
influence_weight_entropy_threshold
andauto_ablation_weight_feature
that are set using this endpoint are used as defaults byreduce_data()
.- Parameters:
auto_ablation_enabled (
bool
, default:False
) – When True, thetrain()
method will ablate cases that meet the set criteria.ablated_cases_distribution_batch_size (
int
, default:100
) – Number of cases in a batch to distribute ablated cases’ influence weights.auto_ablation_weight_feature (
str
, default:'.case_weight'
) – The weight feature that should be accumulated to when cases are ablated.batch_size (
int
, default:2000
) – Number of cases in a batch to consider for ablation prior to training and to recompute influence weight entropy.minimum_model_size (
int
, default:1000
) – The threshold ofr the minimum number of cases at which the model should auto-ablate.influence_weight_entropy_threshold (
float
, default:0.6
) – The influence weight entropy quantile that a case must be beneath in order to be trained.exact_prediction_features (
Collection
[str
] |None
, default:None
) – For each of the features specified, will ablate a case if the prediction matches exactly.residual_prediction_features (
Collection
[str
] |None
, default:None
) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= feature residual.tolerance_prediction_threshold_map (
Mapping
[str
,tuple
[float
,float
]] |None
, default:None
) – For each of the features specified, will ablate a case if the prediction >= (case value - MIN) and the prediction <= (case value + MAX).relative_prediction_threshold_map (
Mapping
[str
,float
] |None
, default:None
) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= relative thresholdconviction_lower_threshold (
float
|None
, default:None
) – The conviction value above which cases will be ablated.conviction_upper_threshold (
float
|None
, default:None
) – The conviction value below which cases will be ablated.
- set_auto_analyze_params(auto_analyze_enabled=False, analyze_threshold=None, *, analyze_growth_factor=None, **kwargs)#
Set parameters for auto analysis.
Auto-analysis is disabled if this is called without specifying an analyze_threshold.
See also
The keyword arguments of
analyze()
.- Parameters:
auto_analyze_enabled (
bool
, default:False
) – When True, thetrain()
method will trigger an analyze when it’s time for the model to be analyzed again.analyze_threshold (
int
|None
, default:None
) – The threshold for the number of cases at which the model should be re-analyzed.analyze_growth_factor (
float
|None
, default:None
) – The factor by which to increase the analysis threshold every time the model grows to the current threshold size.**kwargs – Accepts any of the keyword arguments in
analyze()
.
- Return type:
None
- set_feature_attributes(feature_attributes)#
Update the trainee feature attributes.
- Parameters:
feature_attributes (
Mapping
[str
,Mapping
] |SingleTableFeatureAttributes
) – The feature attributes of the trainee. Where featurename
is the key and a sub dictionary of feature attributes is the value.- Returns:
The updated feature attributes of the trainee.
- Return type:
- set_metadata(metadata)#
Update the trainee metadata.
- Parameters:
metadata (
Mapping
[str
,Any
] |None
) – Any key-value pair to store as custom metadata for the trainee. ProvidingNone
will remove the current metadata.
- set_params(params)#
Set the workflow attributes for the trainee.
- Parameters:
params (
Mapping
[str
,Any
]) –A dictionary in the following format containing the hyperparameter information, which is required, and other parameters which are all optional.
Example:
{ "hyperparameter_map": { ".targetless": { "robust": { ".none": { "dt": -1, "p": .1, "k": 8 } } } }, "auto_analyze_enabled": False, "analyze_threshold": 100, "analyze_growth_factor": 7.389 }
- set_random_seed(seed)#
Set the random seed for the trainee.
- Parameters:
seed (
int
|float
|str
) – The random seed.
- set_substitute_feature_values(substitution_value_map)#
Set a substitution map for use in extended nominal generation.
- Parameters:
substitution_value_map (
Mapping
[str
,Mapping
[str
,Any
]]) –A dictionary of feature name to a dictionary of feature value to substitute feature value.
If this dict is None, all substitutions will be disabled and cleared. If any feature in the
substitution_value_map
has features mapping toNone
or{}
, substitution values will immediately be generated.
- train(cases, *, accumulate_weight_feature=None, batch_size=None, derived_features=None, features=None, initial_batch_size=None, input_is_substituted=False, progress_callback=None, series=None, skip_auto_analyze=False, train_weights_only=False, validate=True)#
Train one or more cases into the trainee (model).
- Parameters:
cases (
DataFrame
|list
[list
[Any
]]) – One or more cases to train into the model.accumulate_weight_feature (
str
|None
, default:None
) – Name of feature into which to accumulate neighbors’ influences as weight for ablated cases. If unspecified, will not accumulate weights.batch_size (
int
|None
, default:None
) – Define the number of cases to train at once. If left unspecified, the batch size will be determined automatically.derived_features (
Collection
[str
] |None
, default:None
) – List of feature names for which values should be derived in the specified order. If this list is not provided, features with the ‘auto_derive_on_train’ feature attribute set to True will be auto-derived. If provided an empty list, no features are derived. Any derived_features that are already in the ‘features’ list will not be derived since their values are being explicitly provided.features (
Collection
[str
] |None
, default:None
) – A list of feature names. This parameter must be provided whencases
is not a DataFrame with named columns. Otherwise, this parameter can be provided when you do not want to train on all of the features incases
or you want to re-order the features incases
.initial_batch_size (
int
|None
, default:None
) – Define the number of cases to train in the first batch. If unspecified, a default defined by thetrain_initial_batch_size
property of the selected client will be used. The number of cases in following batches will be automatically adjusted. This value is ignored ifbatch_size
is specified.input_is_substituted (
bool
, default:False
) – If True assumes provided nominal feature values have already been substituted.progress_callback (
Callable
|None
, default:None
) – A callback method that will be called before each batched call to train and at the end of training. The method is given a ProgressTimer containing metrics on the progress and timing of the train operation.series (
str
|None
, default:None
) – The name of the series to pull features and case values from internal series storage. If specified, trains on all cases that are stored in the internal series store for the specified series. The trained feature set is the combined features from storage and the passed in features. If cases is of length one, the value(s) of this case are appended to all cases in the series. If cases is the same length as the series, the value of each case in cases is applied in order to each of the cases in the series.skip_auto_analyze (
bool
, default:False
) – When true, the Trainee will not auto-analyze when appropriate. Instead, the ‘needs_analyze’ property of the Trainee will be updated.train_weights_only (
bool
, default:False
) – When true, and accumulate_weight_feature is provided, will accumulate all of the cases’ neighbor weights instead of training the cases into the model.validate (
bool
, default:True
) – Whether to validate the data against the provided feature attributes. Issues warnings if there are any discrepancies between the data and the features dictionary.
- unload()#
Unload the trainee.
Deprecated since version 1.0.0: Use
release_resources()
instead.
- update()#
Update the remote trainee with local state.
- property calculated_matrices: dict[str, DataFrame] | None#
The calculated matrices.
- Returns:
The calculated matrices.
- property client: AbstractHowsoClient | HowsoPandasClientMixin#
The client instance used by the trainee.
- Returns:
The client instance.
- property features: SingleTableFeatureAttributes#
The trainee feature attributes.
Warning
This returns a deep copy of the feature attributes. To update features attributes of the trainee, use the method
set_feature_attributes()
.- Returns:
The feature attributes of the trainee.
- property metadata: MutableMapping[str, Any] | None#
The trainee metadata.
Warning
This returns a deep copy of the metadata. To update the metadata of the trainee, use the method
set_metadata()
.- Returns:
The metadata of the trainee.
- property name: str | None#
The name of the Trainee.
- Returns:
The Trainee’s name.
- property needs_analyze: bool#
The flag indicating if the Trainee needs to analyze.
- Returns:
A flag indicating if the Trainee needs to analyze.
- property persistence: Literal['allow', 'always', 'never']#
The persistence state of the Trainee.
- Returns:
The Trainee’s persistence value.
- property save_location: str | PathLike | None#
The current storage location of the trainee.
- Returns:
The current storage location of the trainee based on the last saved location or the location from which the trainee was loaded from. If not saved or loaded from a custom location, then the default save location will be returned.
- howso.engine.delete_project(project_id, *, client=None)#
Delete an existing project.
Projects may only be deleted when they have no trainees in them.
- Parameters:
project_id (
str
|UUID
) – The id of the project.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use.
- Return type:
None
- howso.engine.delete_trainee(name_or_id=None, file_path=None, client=None)#
Delete an existing Trainee.
Loaded trainees exist in memory while also potentially existing on disk. This is a convenience function that allows the deletion of Trainees from both memory and disk.
- Parameters:
name_or_id (
str
|None
, default:None
) – The name or id of the trainee. Deletes the Trainees from memory and attempts to delete a Trainee saved under the same filename from the default save location if nofile_path
is provided.file_path (
str
|PathLike
|None
, default:None
) –The path of the file to load the Trainee from. Used for deleting trainees from disk.
The file path must end with a filename, but file path can be either an absolute path, a relative path or just the file name.
If
name_or_id
is not provided, in addition to deleting from disk, will attempt to delete a Trainee from memory assuming the Trainee has the same name as the filename.If
file_path
is a relative path the absolute path will be computed appending thefile_path
to the CWD.If
file_path
is an absolute path, this is the absolute path that will be used.If
file_path
is just a filename, then the absolute path will be computed appending the filename to the CWD.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.
- howso.engine.get_active_session(*, client=None)#
Get the active session.
- Parameters:
client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.- Returns:
The session instance.
- Return type:
- howso.engine.get_client()#
Get the active Howso client instance.
- Returns:
The active client.
- Return type:
- howso.engine.get_project(project_id, *, client=None)#
Get an existing project.
- Parameters:
project_id (
str
|UUID
) – The id of the project.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The project instance.
- Return type:
- howso.engine.get_session(session_id, *, client=None)#
Get an existing Session.
- Parameters:
session_id (
str
|UUID
) – The id of the session.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The session instance.
- Return type:
- howso.engine.get_trainee(name_or_id, *, client=None)#
Get an existing trainee from Howso Services.
- Parameters:
name_or_id (
str
) – The name or id of the trainee.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The Trainee instance.
- Raises:
HowsoError – If the Trainee could not be found.
- Return type:
- howso.engine.list_projects(*args, **kwargs)#
Query accessible Projects.
DEPRECATED: use get_projects instead.
- Return type:
list
[Project
]
- howso.engine.list_sessions(*args, **kwargs)#
Query accessible Sessions.
DEPRECATED: Use query_sessions instead.
- Return type:
list
[Session
]
- howso.engine.list_trainees(*args, **kwargs)#
Query accessible Trainees.
DEPRECATED: use query_trainees instead.
- howso.engine.load_trainee(file_path, client=None)#
Load an existing trainee from disk.
- Parameters:
file_path (
str
|PathLike
) –The path of the file to load the Trainee from. This path can contain an absolute path, a relative path or simply a file name. A
.caml
file name must be always be provided if file paths are provided.If
file_path
is a relative path the absolute path will be computed appending thefile_path
to the CWD.If
file_path
is an absolute path, this is the absolute path that will be used.If
file_path
is just a filename, then the absolute path will be computed appending the filename to the CWD.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use. Must have local disk access.
- Returns:
The trainee instance.
- Return type:
- howso.engine.query_projects(search_terms=None, *, client=None)#
Query accessible Projects.
- Parameters:
search_terms (
str
|None
, default:None
) – Terms to filter results by.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The list of project instances.
- Return type:
list[Project]
- howso.engine.query_sessions(search_terms=None, *, client=None, project=None, trainee=None)#
Query accessible Sessions.
- Parameters:
search_terms (
str
|None
, default:None
) – Terms to filter results by.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.project (
str
|Project
|None
, default:None
) – The instance or id of a project to filter by. Ignored if client does not support projects.trainee (
str
|Trainee
|None
, default:None
) – The instance or id of a Trainee to filter by.
- Returns:
The list of session instances.
- Return type:
list[Session]
- howso.engine.query_trainees(search_terms=None, *, client=None, project=None)#
Query accessible Trainees.
This method only returns a simplified informational listing of available trainees, not full engine Trainee instances. To get a Trainee instance that can be used with the engine API call
get_trainee
.- Parameters:
search_terms (
str
|None
, default:None
) – Terms to filter results by.client (
AbstractHowsoClient
|None
, default:None
) – The Howso client instance to use.project (
str
|Project
|None
, default:None
) – The instance or id of a project to filter by.
- Returns:
The list of available trainees.
- Return type:
list[dict]
- howso.engine.switch_project(project_id, *, client=None)#
Set the active project.
- Parameters:
project_id (
str
|UUID
) – The id of the project.client (
ProjectClient
|None
, default:None
) – The Howso client instance to use.
- Returns:
The newly active project instance.
- Return type:
- howso.engine.use_client(client)#
Set the active Howso client instance to use for the API.
- Parameters:
client (
AbstractHowsoClient
) – The client instance.- Raises:
ValueError – When the client is not an instance of AbstractHowsoClient.
- Return type:
None