howso.direct#

Classes

HowsoCore

Howso Core API.

HowsoDirectClient

The direct Howso client.

The Python API for the Howso Direct Client.

class howso.direct.HowsoCore(library_path=None, gc_interval=100, howso_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/diveplane-howso-docs/envs/release-2024.5.1/lib/python3.11/site-packages/howso/howso-engine'), howso_fname='howso.caml', trace=False, sbf_datastore_enabled=True, max_num_threads=0, **kwargs)#

Bases: object

Howso Core API.

This class is used in conjunction with the Amalgam python interface to interact with the Howso Core and Amalgam binaries.

Parameters:
  • handle (str) – Handle for the Howso entity. If none is provided a random 6 digit alphanumeric handle will be assigned.

  • library_path (str, optional) – Path to Amalgam library.

  • gc_interval (int, default 100) – Number of Amalgam operations to perform before forcing garbage collection. Lower is better at memory management but compromises performance. Higher is better performance but may result in higher memory usage.

  • howso_path (str, default DEFAULT_CORE_PATH) – Directory path to the Howso caml files.

  • howso_fname (str, default "howso.caml") – Name of the Howso caml file with extension.

  • trace (bool, default False) – If true, sets debug flag for amlg operations. This will generate an execution trace useful in debugging with the standard name of howso_[random 6 byte hex]_execution.trace.

  • sbf_datastore_enabled (bool, default True) – If true, sbf tree structures are enabled.

  • max_num_threads (int, default 0) – If a multithreaded Amalgam binary is used, sets the maximum number of threads to the value specified. If 0, will use the number of visible logical cores.

add_feature(trainee_id, feature, feature_value=None, *, condition=None, condition_session=None, feature_attributes=None, overwrite=False, session=None)#

Add a feature.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • feature (str) – The feature name.

  • feature_value (int or float or str, optional) – The feature value.

  • condition (str, optional) – A condition map where features will only be removed when certain criteria is met.

  • condition_session (str optional) – If specified, ignores the condition parameter and operates on cases for the specified session id.

  • overwrite (bool, default False) – If True, the feature will be over-written if it exists.

  • session (str, optional) – The identifier of the Trainee session to associate the feature addition with.

Return type:

None

analyze(trainee_id, **kwargs)#

Analyzes a trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • kwargs (dict) – Analysis arguments.

Return type:

None

append_to_series_store(trainee_id, series, contexts, *, context_features=None)#

Append the specified contexts to a series store.

For use with train series.

Parameters:
  • trainee_id (str) – The ID of the Trainee to append to.

  • series (str) – The name of the series store to append to.

  • contexts (list of list of object) – The list of list of context values to append to the series.

  • context_features (iterable of str, optional) – The list of feature names for contexts.

Return type:

None

auto_analyze(trainee_id)#

Auto-analyze the Trainee model.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Return type:

None

auto_analyze_params(trainee_id, auto_analyze_enabled=False, analyze_threshold=None, auto_analyze_limit_size=None, analyze_growth_factor=None, **kwargs)#

Set trainee parameters for auto analysis.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • auto_analyze_enabled (bool, default False) – Enable auto analyze when training. Train will return a status indicating when to auto analyze.

  • analyze_threshold (int, optional) – The threshold for the number of cases at which the model should be re-analyzed.

  • auto_analyze_limit_size (int, optional) – The size of of the model at which to stop doing auto-analysis. Value of 0 means no limit.

  • analyze_growth_factor (float, optional) – The factor by which to increase the analyze threshold every time the model grows to the current threshold size.

  • kwargs (dict, optional) – Parameters specific for analyze() may be passed in via kwargs, and will be cached and used during future auto-analysis.

Return type:

None

batch_react(trainee_id, *, action_features=None, action_values=None, allow_nulls=False, case_indices=None, context_features=None, context_values=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, extra_features=None, feature_bounds_map=None, generate_new_cases='no', input_is_substituted=False, into_series_store=None, leave_case_out=False, new_case_threshold='min', num_cases_to_generate=None, ordered_by_specified_features=False, post_process_features=None, post_process_values=None, preserve_feature_values=None, substitute_output=True, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

Multiple case react.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • context_values (list of list of object, optional) – A 2d list of context values to react to. If None for discriminative react, it is assumed that session and session_id keys are set in the details.

  • action_features (iterable of str, optional) – An iterable of feature names to treat as action features during react.

  • action_values (list of list of object, optional) – One or more action values to use for action features. If specified, will only return the specified explanation details for the given actions. (Discriminative reacts only)

  • allow_nulls (bool, default False) – When true will allow return of null values if there are nulls in the local model for the action features, applicable only to discriminative reacts.

  • context_features (iterable of str, optional) – An iterable of feature names to treat as context features during react.

  • derived_context_features (iterable of str, optional) – An iterable of feature names whose values should be computed from the provided context in the specified order. Must be different than context_features.

  • derived_action_features (iterable of str, optional) – An iterable of feature names whose values should be computed after generation from the generated case prior to output, in the specified order. Must be a subset of action_features.

  • exclude_novel_nominals_from_uniqueness_check (bool, default False) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens when generate_new_cases is True. Only applies to generative reacts.

  • input_is_substituted (bool, default False) – if True assumes provided categorical (nominal or ordinal) feature values have already been substituted.

  • substitute_output (bool, default True) – If False, will not substitute categorical feature values. Only applicable if a substitution value map has been set.

  • details (dict, optional) – If details are specified, the response will contain the requested explanation data along with the reaction.

  • desired_conviction (float) – If specified will execute a generative react. If not specified will executed a discriminative react. Conviction is the ratio of expected surprisal to generated surprisal for each feature generated, valid values are in the range of zero to infinity.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • case_indices (Iterable of Sequence[Union[str, int]], defaults to None) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null, generative react ignores it.

  • post_process_features (iterable of str, optional) – List of feature names that will be made available during the execution of post_process feature attributes.

  • post_process_values (list of list of object, optional) – A 2d list of values corresponding to post_process_features that will be made available during the execution of post_process feature attributes.

  • preserve_feature_values (iterable of str) – List of features that will preserve their values from the case specified by case_indices, appending and overwriting the specified contexts as necessary. For generative reacts, if case_indices isn’t specified will preserve feature values of a random case.

  • leave_case_out (bool, default False) – If set to True and specified along with case_indices, each individual react will respectively ignore the corresponding case specified by case_indices by leaving it out.

  • into_series_store (str, optional) – The name of a series store. If specified, will store an internal record of all react contexts for this session and series to be used later with train series.

  • use_regional_model_residuals (bool) – If false uses model feature residuals, if True recalculates regional model residuals.

  • feature_bounds_map (dict of dict) – A mapping of feature names to the bounds for the feature values to be generated in.

  • generate_new_cases ({"always", "attempt", "no"}, default "no") – How to generate new cases.

  • ordered_by_specified_features (bool, default False) – If True order of generated feature values will match the order of specified features.

  • num_cases_to_generate (int, default 1) – The number of cases to generate.

  • new_case_threshold ({"min", "max", "most_similar"}, optional) – Distance to determine the privacy cutoff. If None, will default to “min”.

Return type:

Tuple[Dict, int, int]

Returns:

  • dict – The react result including audit details.

  • int – The request payload size.

  • int – The result payload size.

batch_react_group(trainee_id, new_cases, *, features=None, distance_contributions=False, familiarity_conviction_addition=True, familiarity_conviction_removal=False, kl_divergence_addition=False, kl_divergence_removal=False, p_value_of_addition=False, p_value_of_removal=False, weight_feature=None, use_case_weights=False)#

Computes specified data for a set of cases.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • new_cases (list of list of list of object or list) – Specify a set using a list of cases to compute the conviction of groups of cases as shown in the following example.

  • features (iterable of str, optional) – An iterable of feature names to consider while calculating convictions.

  • distance_contributions (bool, default False) – Calculate and output distance contribution ratios in the output dict for each case.

  • familiarity_conviction_addition (bool, default True) – Calculate and output familiarity conviction of adding the specified cases.

  • familiarity_conviction_removal (bool, default False) – Calculate and output familiarity conviction of removing the specified cases.s

  • kl_divergence_addition (bool, default False) – Calculate and output KL divergence of adding the specified cases.

  • kl_divergence_removal (bool, default False) – Calculate and output KL divergence of removing the specified cases.

  • p_value_of_addition (bool, default False) – If true will output p value of addition.

  • p_value_of_removal (bool, default False) – If true will output p value of removal.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

The react response.

Return type:

dict

batch_react_series(trainee_id, *, action_features=None, action_values=None, case_indices=None, context_values=None, context_features=None, continue_series=False, continue_series_features=None, continue_series_values=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, extra_features=None, feature_bounds_map=None, final_time_steps=None, generate_new_cases='no', init_time_steps=None, initial_features=None, initial_values=None, input_is_substituted=False, leave_case_out=False, max_series_lengths=None, new_case_threshold='min', num_series_to_generate=1, ordered_by_specified_features=False, output_new_series_ids=True, preserve_feature_values=None, series_context_features=None, series_context_values=None, series_id_tracking='fixed', series_stop_maps=None, substitute_output=True, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

React in a series until a series_stop_map condition is met.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • num_series_to_generate (int, optional) – The number of series to generate.

  • final_time_steps (list of object, optional) – The time steps at which to end synthesis. Time-series only. Must provide either one for all series, or exactly one per series.

  • init_time_steps (list of object, optional) – The time steps at which to begin synthesis. Time-series only. Must provide either one for all series, or exactly one per series.

  • initial_features (iterable of str, optional) – List of features to condition just the first case in a series, overwrites context_features and derived_context_features for that first case. All specified initial features must be in one of: context_features, action_features, derived_context_features or derived_action_features. If provided a value that isn’t in one of those lists, it will be ignored.

  • initial_values (list of list of object, optional) – 2d list of values corresponding to the initial_features, used to condition just the first case in each series. Must provide either one for all series, or exactly one per series.

  • series_stop_maps (list of dict of dict, optional) – A dictionary of feature name to stop conditions. Must provide either one for all series, or exactly one per series.

  • max_series_lengths (list of int, optional) – maximum size a series is allowed to be. Default is 3 * model_size, a 0 or less is no limit. If forecasting with continue_series, this defines the maximum length of the forecast. Must provide either one for all series, or exactly one per series.

  • continue_series (bool, default False) –

    When True will attempt to continue existing series instead of starting new series. If initial_values provide series IDs, it will continue those explicitly specified IDs, otherwise it will randomly select series to continue. .. note:

    Terminated series with terminators cannot be continued and
    will result in null output.
    

  • continue_series_features (list of str, optional) – The list of feature names corresponding to the values in each row of continue_series_values. This value is ignored if continue_series_values is None.

  • continue_series_values (list of list of list of object or list of pandas.DataFrame, default None) – The set of series data to be forecasted with feature values in the same order defined by continue_series_values. The value of continue_series will be ignored and treated as true if this value is specified.

  • derived_context_features (iterable of str, optional) – List of context features whose values should be computed from the entire series in the specified order. Must be different than context_features.

  • derived_action_features (iterable of str, optional) – List of action features whose values should be computed from the resulting last row in series, in the specified order. Must be a subset of action_features.

  • series_context_features (iterable of str, optional) – List of context features corresponding to series_context_values, if specified must not overlap with any initial_features or context_features.

  • series_context_values (list of list of list of object or list, optional) – 3d-list of context values, one for each feature for each row for each series. If specified, max_series_lengths are ignored.

  • output_new_series_ids (bool, default True) – If True, series ids are replaced with unique values on output. If False, will maintain or replace ids with existing trained values, but also allows output of series with duplicate existing ids.

  • series_id_tracking ({"dynamic", "fixed", "no"}, default "fixed") – Controls how closely generated series should follow existing series.

  • context_values (list of list of object) – See parameter contexts in react().

  • action_features (iterable of str) – See parameter action_features in react().

  • action_values (list of list of object) – See parameter actions in react().

  • context_features (iterable of str) – See parameter context_features in react().

  • input_is_substituted (bool, default False) – See parameter input_is_substituted in react().

  • substitute_output (bool) – See parameter substitute_output in react().

  • details (dict, optional) – See parameter details in react().

  • desired_conviction (float) – See parameter desired_conviction in react().

  • exclude_novel_nominals_from_uniqueness_check (bool, default False) – See parameter exclude_novel_nominals_from_uniqueness_check in react().

  • weight_feature (str) – See parameter weight_feature in react().

  • use_case_weights (bool) – See parameter use_case_weights in react().

  • case_indices (iterable of sequence of str, int) – See parameter case_indices in react().

  • preserve_feature_values (iterable of str) – See parameter preserve_feature_values in react().

  • new_case_threshold (str) – See parameter new_case_threshold in react().

  • leave_case_out (bool) – See parameter leave_case_out in react().

  • use_regional_model_residuals (bool) – See parameter use_regional_model_residuals in react().

  • feature_bounds_map (dict of dict) – See parameter feature_bounds_map in react().

  • generate_new_cases ({"always", "attempt", "no"}) – See parameter generate_new_cases in react().

  • ordered_by_specified_features (bool) – See parameter ordered_by_specified_features in react().

Return type:

Tuple[Dict, int, int]

Returns:

  • dict – A dictionary with keys action_features and series. Where series is a 2d list of values (rows of data per series), and action_features is the list of all action features (specified and derived).

  • int – The request payload size.

  • int – The result payload size.

clean_data(trainee_id, context_features=None, action_features=None, remove_duplicates=None)#

Cleans up Trainee data.

Removes unused sessions, cases or actions missing data, etc. If a trainee identifier is not specified, it will look to the entity’s own label of the same name.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to clean.

  • context_features (list of str, optional) – Only remove cases that don’t have specified context features.

  • action_features (list of str, optional) – Only remove cases that don’t have specified action features.

  • remove_duplicates (bool, default False) – If true, will remove duplicate cases (cases with identical values).

Return type:

None

clear_conviction_thresholds(trainee_id)#

Set the conviction thresholds to null.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Return type:

None

clear_imputed_session(trainee_id, impute_session, *, session=None)#

Clear values that were imputed during a specified session.

Won’t clear values that were manually set by user after the impute.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • impute_session (str,) – The impute session to clear.

  • session (str, optional) – The identifier of the Trainee session to associate this edit with.

Return type:

None

compute_conviction_of_features(trainee_id, *, features=None, action_features=None, familiarity_conviction_addition=True, familiarity_conviction_removal=False, weight_feature=None, use_case_weights=False)#

Get familiarity conviction for features in the model.

Parameters:
  • trainee_id (str) – The id of the trainee.

  • features (iterable of str, optional) – An iterable of feature names to calculate convictions. At least 2 features are required to get familiarity conviction. If not specified all features will be used.

  • action_features (iterable of str, optional) – An iterable of feature names to be treated as action features during conviction calculation in order to determine the conviction of each feature against the set of action_features. If not specified, conviction is computed for each feature against the rest of the features as a whole.

  • familiarity_conviction_addition (bool, default True) – Calculate and output familiarity conviction of adding the specified features in the output.

  • familiarity_conviction_removal (bool, default False) – Calculate and output familiarity conviction of removing the specified features in the output.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

A dict with familiarity_conviction_addition or familiarity_conviction_removal.

Return type:

dict

compute_feature_weights(trainee_id, action_feature=None, context_features=None, robust=False, weight_feature=None, use_case_weights=False)#

Compute feature weights for specified context and action features.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • action_feature (str, optional) – Action feature for which to set the specified feature weights for.

  • context_features (iterable of str) – List of context feature names.

  • robust (bool, default False.) – When true, the power set/permutations of features are used as contexts to calculate the residual for a given feature. When false, the full set of features is used to calculate the residual for a given feature.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

A dictionary of computed context features -> weights.

Return type:

dict

copy(trainee_id, target_trainee_id)#

Copy the contents of one Trainee into another.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to copy from.

  • target_trainee_id (str) – The identifier of the Trainee to copy into.

Returns:

A dict containing the name of the trainee that was created by copy.

Return type:

dict

copy_subtrainee(trainee_id, new_trainee_name, source_id=None, source_name_path=None, target_id=None, target_name_path=None)#

Copy a subtrainee in trainee’s hierarchy.

Parameters:
  • trainee_id (str) – The id of the trainee whose hierarchy is to be modified.

  • new_trainee_name (str) – The name of the new Trainee.

  • source_id (str, optional) – Id of source trainee to copy. Ignored if source_name_path is specified. If neither source_name_path nor source_id are specified, copies the trainee itself.

  • source_name_path (list of str, optional) – list of strings specifying the user-friendly path of the child subtrainee to copy.

  • target_id (str, optional) – Id of target trainee to copy trainee into. Ignored if target_name_path is specified. If neither target_name_path nor target_id are specified, copies as a direct child of trainee.

  • target_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to copy trainee into.

Return type:

None

create_subtrainee(trainee_id, trainee_name, subtrainee_id=None)#

Create a subtrainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to be modified.

  • trainee_name (str) – Name of subtrainee to create.

  • subtrainee_id (str, optional) – Unique id for subtrainee.

Returns:

A dict containing the name of the subtrainee that was created.

Return type:

dict

create_trainee(trainee_id)#

Create a Trainee.

Parameters:

trainee_id (str) – The identifier of the Trainee to create.

Returns:

A dict containing the name of the trainee that was created.

Return type:

dict

static default_library_ext()#

Returns the default library extension based on runtime os.

Return type:

str

delete(trainee_id)#

Delete a Trainee.

Parameters:

trainee_id (str) – The identifier of the Trainee to delete.

Return type:

None

delete_subtrainee(trainee_id, trainee_name)#

Delete a child subtrainee.

Parameters:
  • trainee_id (str) – The id of the trainee whose hierarchy is to be modified.

  • trainee_name (str) – The name of the subtrainee to be deleted.

Return type:

None

distances(trainee_id, features=None, *, action_feature=None, case_indices=None, feature_values=None, use_case_weights=False, weight_feature=None, row_offset=0, row_count=None, column_offset=0, column_count=None)#

Compute distances matrix for specified cases.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • features (iterable of str, optional) – List of feature names to use when computing distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • case_indices (Iterable of Sequence[Union[str, int]], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns distances for all of these cases. Ignored if feature_values is provided. If neither feature_values nor case_indices is specified, uses full dataset.

  • feature_values (list of object, optional) – If specified, returns distances of the local model relative to these values, ignores case_indices parameter.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A dictionary of keys, ‘distances’, ‘row_case_indices’ and ‘column_case_indices’.

Return type:

dict

classmethod download_amlg(config)#

Download amalgam binaries.

Requires the howso-build-artifacts dependency.

Parameters:

config (dict) – The amalgam configuration options.

Returns:

The path to the downloaded amalgam directory. Or None if nothing was downloaded.

Return type:

Path

classmethod download_core(config)#

Download core binaries.

Requires the howso-build-artifacts dependency.

Parameters:

config (dict) – The core configuration options.

Returns:

The path to the downloaded core directory. Or None if nothing was downloaded.

Return type:

Path

edit_cases(trainee_id, feature_values=None, *, case_indices=None, condition=None, condition_session=None, features=None, num_cases=None, precision=None, session=None)#

Edit feature values for the specified cases.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • feature_values (list of object) – The feature values to edit the case(s) with. If specified as a list, the order corresponds with the order of the features parameter.

  • case_indices (Iterable of Sequence[Union[str, int]], optional) – Iterable of Sequences containing the session id and index, where index is the original 0-based index of the case as it was trained into the session. This explicitly specifies the cases to edit. When specified, condition and condition_session are ignored.

  • condition (dict, optional) – A condition map to select which cases to edit. Ignored when case_indices are specified.

  • condition_session (str, optional) – If specified, ignores the condition and operates on all cases for the specified session.

  • features (iterable of str, optional) – The names of the features to edit. Corresponds to feature_values.

  • num_cases (int, default None) – The maximum amount of cases to edit. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases, defaults to “exact”.

  • session (str, optional) – The identifier of the Trainee session to associate the edit with.

Returns:

A dictionary with key ‘count’ for the number of modified cases.

Return type:

dict

classmethod escape_filename(s)#

Escape filename.

Return type:

str

evaluate(trainee_id, features_to_code_map, *, aggregation_code=None)#

Evaluate custom code on feature values of all cases in the trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • features_to_code_map (dict of str to str) – A dictionary with feature name keys and custom Amalgam code string values.

  • aggregation_code (str, optional) – A string of custom Amalgam code that can access the list of values derived form the custom code in features_to_code_map.

Returns:

A dictionary with keys: ‘evaluated’ and ‘aggregated’.

Return type:

dict

execute_on_subtrainee(trainee_id, method, *, as_external=False, child_id=None, child_name_path=None, payload=None, load_external_trainee_id=None)#

Executes any method in the engine API directly on any child trainee.

Parameters:
  • method (str, name of method to execute)

  • payload (dict, parameters specific to the method being called)

  • child_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee for execution of method.

  • child_id (str, optional) – Unique id of child trainee to execute method. Ignored if child_name_path is specified.

  • as_external (bool) – Applicable only to ‘load’ and ‘save’ methods and if specifying child_name_path or child_id. For ‘save’, stores the child out as an independent trainee and removes it as a contained entity. For ‘load’ updates hierarchy by adding the child as an independently stored trainee to the hierarchy without loading the trainee as a subtrainee.

  • load_external_trainee_id (str, optional) – Trainee id of trainee being loaded, must be specified only when method is ‘load’ and as_external is true.

  • trainee_id (str) – The id of the Trainee to execute methods on.

Returns:

Whatever output the executed method returns.

Return type:

object

export_trainee(trainee_id, path_to_trainee=None, decode_cases=False, separate_files=False)#

Export a saved Trainee’s data to json files for migration.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • path_to_trainee (Path or str, optional) – The path to where the saved trainee file is located.

  • decoded_cases (bool, default False.) – Whether to export decoded cases.

  • separate_files (bool, default False) – Whether to load each case from its individual file.

Return type:

None

get_auto_ablation_params(trainee_id)#

Get trainee parameters for auto ablation set by set_auto_ablation_params().

get_cases(trainee_id, session=None, *, case_indices=None, indicate_imputed=False, features=None, condition=None, num_cases=None, precision=None)#

Retrieve cases from a Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str, optional) – The session identifier to retrieve cases for, in their trained order.

  • case_indices (iterable of sequence of str, int, optional) – Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns only these cases and ignores the session parameter.

  • indicate_imputed (bool, default False) – If set, an additional value will be appended to the cases indicating if the case was imputed.

  • features (iterable of str, optional) – A list of feature names to return values for in leu of all default features.

  • condition (dict, optional) – The condition map to select the cases to retrieve that meet all the provided conditions.

  • num_cases (int, default None) – The maximum amount of cases to retrieve. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision ({"exact", "similar}, optional) – The precision to use when retrieving the cases via condition. If not provided, “exact” will be used.

Returns:

A dictionary containing keys ‘features’ and ‘cases’.

Return type:

dict

get_entities()#

Get loaded entities.

Returns:

A list of entity identifiers that are currently loaded.

Return type:

list of str

get_feature_attributes(trainee_id)#

Get Trainee feature attributes.

Parameters:

trainee_id (str) – The identifier of the Trainee

Returns:

A dictionary of feature name to dictionary of feature attributes.

Return type:

dict

get_feature_contributions(trainee_id, action_feature, robust=None, directional=False, weight_feature=None)#

Get cached feature contributions. :rtype: Dict

Deprecated since version 1.0.0: Use get_prediction_stats() instead.

get_feature_mda(trainee_id, action_feature, permutation=None, robust=None, weight_feature=None)#

Get cached feature Mean Decrease In Accuracy (MDA). :rtype: Dict

Deprecated since version 1.0.0: Use get_prediction_stats() instead.

get_feature_residuals(trainee_id, action_feature=None, robust=None, robust_hyperparameters=None, weight_feature=None)#

Get cached feature residuals. :rtype: Dict

Deprecated since version 1.0.0: Use get_prediction_stats() instead.

get_hierarchy(trainee_id)#

Output the hierarchy for a trainee.

Returns:

dict of {str – Dictionary of the currently contained hierarchy as a nested dict with False for trainees that are stored independently.

Return type:

dict}

get_internal_parameters(trainee_id, *, action_feature=None, context_features=None, mode=None, weight_feature=None)#

Get the parameters used by the Trainee.

If ‘action_feature’, ‘context_features’, ‘mode’, or ‘weight_feature’ are specified, then the best hyperparameters analyzed in the Trainee are the value of the ‘hyperparameter_map’ key, otherwise this value will be the dictionary containing all the hyperparameter sets in the Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • action_feature (str, optional) – If specified will return the best analyzed hyperparameters to target this feature.

  • context_features (str, optional) – If specified, will find and return the best analyzed hyperparameters to use with these context features.

  • mode (str, optional) – If specified, will find and return the best analyzed hyperparameters that were computed in this mode.

  • weight_feature (str, optional) – If specified, will find and return the best analyzed hyperparameters that were analyzed using this weight feaure.

Returns:

A dict including the either all of the Trainee’s internal parameters or only the best hyperparameters selected using the passed parameters.

Return type:

dict

get_loaded_trainees()#

Get loaded Trainees.

Returns:

A list of trainee identifiers that are currently loaded.

Return type:

list of str

get_marginal_stats(trainee_id, *, condition=None, num_cases=None, precision=None, weight_feature=None)#

Get marginal stats for all features.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • condition (dict or None, optional) –

    A condition map to select which cases to compute marginal stats for.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • num_cases (int, default None) – The maximum amount of cases to use to calculate marginal stats. If not specified, the limit will be k cases if precision is “similar”. Only used if condition is not None.

  • precision (str, default None) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used if condition is not None.

  • weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A map of feature names to map of stat type to stat values.

Return type:

dict of str to dict of str to float

get_metadata(trainee_id)#

Get trainee metadata.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Returns:

The metadata dictionary.

Return type:

dict or None

get_num_training_cases(trainee_id)#

Return the number of trained cases in the model.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Returns:

A dictionary containing the key “count”.

Return type:

dict

get_prediction_stats(trainee_id, *, action_feature=None, condition=None, num_cases=None, num_robust_influence_samples_per_case=None, precision=None, robust=None, robust_hyperparameters=None, stats=None, weight_feature=None)#

Get feature prediction stats.

Parameters:
  • trainee_id (str) – The id or name of the trainee.

  • action_feature (str, optional) – When specified, will attempt to return stats that were computed for this specified action_feature. Note: “.targetless” is the action feature used during targetless analysis.

  • condition (dict or None, optional) – A condition map to select which cases to compute prediction stats for.

  • num_cases (int, default None) – The maximum amount of cases to use to calculate prediction stats. If not specified, the limit will be k cases if precision is “similar”, or 1000 cases if precision is “exact”. Only used if condition is not None.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • precision (str, default None) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used if condition is not None.

  • robust (bool, optional) – When specified, will attempt to return stats that were computed with the specified robust or non-robust type.

  • robust_hyperparameters (bool, optional) – When specified, will attempt to return stats that were computed using hyperparameters with the specified robust or non-robust type.

  • stats (iterable of str, optional) – List of stats to output. When unspecified, returns all.

  • weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A map of feature to map of stat type to stat values.

Return type:

dict of str to dict of str to float

get_session_indices(trainee_id, session)#

Get list of all session indices for a specified session.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str) – The identifier of the session.

Returns:

A list of the session indices for the session.

Return type:

list of int

get_session_metadata(trainee_id, session)#

Get the Trainee session metadata.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str) – The identifier of the Trainee session.

Returns:

The metadata of the session. Or None if no metadata set.

Return type:

dict or None

get_session_training_indices(trainee_id, session)#

Get list of all session training indices for a specified session.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str) – The identifier of the session.

Returns:

A list of the session training indices for the session.

Return type:

list of int

get_sessions(trainee_id, attributes)#

Get list of session names.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • attributes (list of str, optional) – List of attributes to return from the session. The session id is always included.

Returns:

The list of Trainee sessions.

Return type:

list of dict

get_substitute_feature_values(trainee_id)#

Get substitution feature values used in case generation.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Returns:

The dictionary of feature name to value to substitution value.

Return type:

dict

get_trainee_version(trainee_id)#

Return the version of the Trainee Template.

Parameters:

trainee_id (str) – The identifier of the Trainee to get the version of.

Return type:

str

impute(trainee_id, *, batch_size=1, features=None, features_to_impute=None, session=None)#

Impute, or fill in the missing values, for the specified features.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • batch_size (int, default 1) – Larger batch size will increase accuracy and decrease speed. Batch size indicates how many rows to fill before recomputing conviction.

  • features (iterable of str, optional) – An iterable of feature names to use for imputation. If not specified, all features will be used imputed.

  • features_to_impute (iterable of str, optional) – An iterable of feature names to impute. If not specified, features will be used (see above).

  • session (str, optional) – The identifier of the Trainee session to associate the edit with.

Return type:

None

load(trainee_id, filename=None, filepath=None)#

Load a persisted Trainee from disk.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to load.

  • filename (str, optional) – The filename to load.

  • filepath (str, optional) – The path containing the filename to load.

Returns:

A dict containing the name of the trainee that was created.

Return type:

dict

load_subtrainee(trainee_id, *, filename=None, filepath=None, trainee_name_path=None)#

Load a persisted Trainee from disk as a subtrainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to be modified.

  • filename (str, optional) – The filename to load.

  • filepath (str, optional) – The path containing the filename to load.

  • trainee_name_path (list of str, optional) – list of strings specifying the user-friendly path of the child subtrainee to load.

Returns:

A dict containing the name of the trainee that was created.

Return type:

dict

move_cases(trainee_id, num_cases=1, *, case_indices=None, condition=None, condition_session=None, distribute_weight_feature=None, precision=None, preserve_session_data=False, session=None, source_id=None, source_name_path=None, target_name_path=None, target_id=None)#

Moves cases from one trainee to another in the hierarchy.

Parameters:
  • trainee_id (str) – The identifier of the Trainee doing the moving.

  • num_cases (int) – The number of cases to move; minimum 1 case must be moved. Ignored if case_indices is specified.

  • case_indices (list of tuples) – A list of tuples containing session ID and session training index for each case to be removed.

  • condition (dict, optional) – The condition map to select the cases to move that meet all the provided conditions. Ignored if case_indices is specified.

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id. Ignored if case_indices is specified.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases. Options are ‘exact’ or ‘similar’. If not specified, “exact” will be used. Ignored if case_indices is specified.

  • preserve_session_data (bool, default False) – When True, will move cases without cleaning up session data.

  • session (str, optional) – The identifier of the Trainee session to associate the move with.

  • source_id (str, optional) – The source trainee unique id from which to move cases. Ignored if source_name_path is specified. If neither source_name_path nor source_id are specified, moves cases from the trainee itself.

  • source_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee from which to move cases.

  • target_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to move cases to.

  • target_id (str, optional) – The target trainee id to move the cases to. Ignored if target_name_path is specified. If neither target_name_path nor target_id are specified, moves cases to the trainee itself.

Returns:

A dictionary with key ‘count’ for the number of moved cases.

Return type:

dict

pairwise_distances(trainee_id, features=None, *, action_feature=None, from_case_indices=None, from_values=None, to_case_indices=None, to_values=None, use_case_weights=False, weight_feature=None)#

Compute pairwise distances between specified cases.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • features (iterable of str, optional) – List of feature names to use when computing pairwise distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • from_case_indices (Iterable of Sequence[Union[str, int]], optional) – An iterable of sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • from_values (list of list of object, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • to_case_indices (Iterable of Sequence[Union[str, int]], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • to_values (list of list of object, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A list of computed pairwise distances between each corresponding pair of cases in from_case_indices and to_case_indices.

Return type:

list

persist(trainee_id, filename=None, filepath=None)#

Save a Trainee to disk.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to save.

  • filename (str, optional) – The name of the file to save the Trainee to.

  • filepath (str, optional) – The path of the file to save the Trainee to.

Return type:

None

static random_handle()#

Generate a random 6 byte hexadecimal handle.

Returns:

A random 6 byte hex.

Return type:

str

react(trainee_id, *, action_features=None, action_values=None, allow_nulls=False, case_indices=None, context_features=None, context_values=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, extra_features=None, feature_bounds_map=None, generate_new_cases='no', input_is_substituted=False, into_series_store=None, leave_case_out=False, new_case_threshold='min', ordered_by_specified_features=False, post_process_features=None, post_process_values=None, preserve_feature_values=None, substitute_output=True, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

Single case react.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • context_values (list of list of object, optional) – A 2d list of context values to react to. If None for discriminative react, it is assumed that session and session_id keys are set in the details.

  • action_features (iterable of str, optional) – An iterable of feature names to treat as action features during react.

  • action_values (list of list of object, optional) – One or more action values to use for action features. If specified, will only return the specified explanation details for the given actions. (Discriminative reacts only)

  • allow_nulls (bool, default False) – When true will allow return of null values if there are nulls in the local model for the action features, applicable only to discriminative reacts.

  • context_features (iterable of str, optional) – An iterable of feature names to treat as context features during react.

  • derived_context_features (iterable of str, optional) – An iterable of feature names whose values should be computed from the provided context in the specified order. Must be different than context_features.

  • derived_action_features (iterable of str, optional) – An iterable of feature names whose values should be computed after generation from the generated case prior to output, in the specified order. Must be a subset of action_features.

  • input_is_substituted (bool, default False) – if True assumes provided categorical (nominal or ordinal) feature values have already been substituted.

  • substitute_output (bool, default True) – If False, will not substitute categorical feature values. Only applicable if a substitution value map has been set.

  • details (dict, optional) – If details are specified, the response will contain the requested explanation data along with the reaction.

  • desired_conviction (float) – If specified will execute a generative react. If not specified will executed a discriminative react. Conviction is the ratio of expected surprisal to generated surprisal for each feature generated, valid values are in the range of zero to infinity.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • case_indices (Iterable of Sequence[Union[str, int]], defaults to None) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null, generative react ignores it.

  • post_process_features (iterable of str, optional) – List of feature names that will be made available during the execution of post_process feature attributes.

  • post_process_values (list of object, optional) – A 2d list of values corresponding to post_process_features that will be made available during the execution of post_process feature attributes.

  • preserve_feature_values (iterable of str) – List of features that will preserve their values from the case specified by case_indices, appending and overwriting the specified contexts as necessary. For generative reacts, if case_indices isn’t specified will preserve feature values of a random case.

  • leave_case_out (bool, default False) – If set to True and specified along with case_indices, each individual react will respectively ignore the corresponding case specified by case_indices by leaving it out.

  • into_series_store (str, optional) – The name of a series store. If specified, will store an internal record of all react contexts for this session and series to be used later with train series.

  • use_regional_model_residuals (bool) – If false uses model feature residuals, if True recalculates regional model residuals.

  • feature_bounds_map (dict of dict) – A mapping of feature names to the bounds for the feature values to be generated in.

  • generate_new_cases ({"always", "attempt", "no"}, default "no") – How to generate new cases.

  • ordered_by_specified_features (bool, default False) – If True order of generated feature values will match the order of specified features.

  • new_case_threshold ({"min", "max", "most_similar"}, optional) – Distance to determine the privacy cutoff. If None, will default to “min”.

  • exclude_novel_nominals_from_uniqueness_check (bool, default False) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens when generate_new_cases is True. Only applies to generative reacts.

Returns:

The react result including audit details.

Return type:

dict

react_into_features(trainee_id, *, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, features=None, influence_weight_entropy=False, p_value_of_addition=False, p_value_of_removal=False, similarity_conviction=False, use_case_weights=False, weight_feature=None)#

Calculate and cache conviction and other statistics.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • features (iterable of str, optional) – An iterable of features to calculate convictions.

  • familiarity_conviction_addition (bool or str, default False) – The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.

  • familiarity_conviction_removal (bool or str, default False) – The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.

  • influence_weight_entropy (bool or str, default False) – The name of the feature to store influence weight entropy values in. If set to True, the values will be stored in the feature ‘influence_weight_entropy’.

  • p_value_of_addition (bool or str, default False) – The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.

  • p_value_of_removal (bool or str, default False) – The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.

  • similarity_conviction (bool or str, default False) – The name of the feature to store similarity conviction values. If set to True the values will be stored to the feature ‘similarity_conviction’.

  • distance_contribution (bool or str, default False) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Return type:

None

react_into_trainee(trainee_id, *, action_feature=None, context_features=None, contributions=None, contributions_robust=None, hyperparameter_param_path=None, mda=None, mda_permutation=None, mda_robust=None, mda_robust_permutation=None, num_robust_influence_samples=None, num_robust_residual_samples=None, num_robust_influence_samples_per_case=None, num_samples=None, residuals=None, residuals_robust=None, sample_model_fraction=None, sub_model_size=None, use_case_weights=False, weight_feature=None)#

Compute and cache specified feature prediction stats.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • action_feature (str, optional) – Name of target feature for which to do computations. Default is whatever the model was analyzed for, e.g., action feature for MDA and contributions, or “.targetless” if analyzed for targetless. This parameter is required for MDA or contributions computations.

  • context_features (iterable of str, optional) – List of features names to use as contexts for computations. Default is all trained non-unique features if unspecified.

  • contributions (bool, optional) – For each context_feature, use the full set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • contributions_robust (bool, optional) – For each context_feature, use the robust (power set/permutation) set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • hyperparameter_param_path (iterable of str, optional.) – Full path for hyperparameters to use for computation. If specified for any residual computations, takes precedence over action_feature parameter. Can be set to a ‘paramPath’ value from the results of ‘get_params()’ for a specific set of hyperparameters.

  • mda (bool, optional) – When True will compute Mean Decrease in Accuracy (MDA) for each context feature at predicting the action_feature. Drop each feature and use the full set of remaining context features for each prediction. False removes cached values.

  • mda_permutation (bool, optional) – Compute MDA by scrambling each feature and using the full set of remaining context features for each prediction. False removes cached values.

  • mda_robust (bool, optional) – Compute MDA by dropping each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • mda_robust_permutation (bool, optional) – Compute MDA by scrambling each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • num_robust_influence_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust contribution computation. Defaults to 300.

  • num_robust_residual_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust mda and residual computation. Defaults to 1000 * (1 + log(number of features)). Note: robust mda will be updated to use num_robust_influence_samples in a future release.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • num_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for all non-robust computation. Defaults to 1000. If specified overrides sample_model_fraction.```

  • residuals (bool, optional) – For each context_feature, use the full set of all other context_features to predict the feature. When True computes and caches MAE (mean absolute error), R^2, RMSE (root mean squared error), and Spearman Coefficient for continuous features, and MAE, accuracy, precision, recall, and Matthews correlation coefficient for nominal features. False removes cached values.

  • residuals_robust (bool, optional) – For each context_feature, computes and caches the same stats as residuals but using the robust (power set/permutations) set of all other context_features to predict the feature. False removes cached values.

  • sample_model_fraction (float, optional) – A value between 0.0 - 1.0, percent of model to use in sampling (using sampling without replacement). Applicable only to non-robust computation. Ignored if num_samples is specified. Higher values provide better accuracy at the cost of compute time.

  • sub_model_size (int, optional) – Subset of model to use for calculations. Applicable only to models > 1000 cases.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – The name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Return type:

None

remove_cases(trainee_id, num_cases=1, *, case_indices=None, condition=None, condition_session=None, distribute_weight_feature=None, precision=None, preserve_session_data=False, session=None)#

Removes cases from a Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • num_cases (int) – The number of cases to remove; minimum 1 case must be removed. Ignored if case_indices is specified.

  • case_indices (list of tuples) – A list of tuples containing session ID and session training index for each case to be removed.

  • condition (dict of str to object, optional) – The condition map to select the cases to remove that meet all the provided conditions. Ignored if case_indices is specified.

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id. Ignored if case_indices is specified.

  • distribute_weight_feature (str, optional) – When specified, will distribute the removed cases’ weights from this feature into their neighbors.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases, defaults to “exact”. Ignored if case_indices is specified.

  • preserve_session_data (bool, default False) – When True, will remove cases without cleaning up session data.

  • session (str, optional) – The identifier of the Trainee session to associate the removal with.

Returns:

A dictionary with key ‘count’ for the number of removed cases.

Return type:

dict

remove_feature(trainee_id, feature, *, condition=None, condition_session=None, session=None)#

Remove a feature.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • feature (str) – The feature name.

  • condition (str, optional) – A condition map where features will only be removed when certain criteria is met.

  • condition_session (str optional) – If specified, ignores the condition parameter and operates on cases for the specified session id.

  • session (str, optional) – The identifier of the Trainee session to associate the feature removal with.

Return type:

None

remove_series_store(trainee_id, series=None)#

Delete part or all of the series store from a Trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee to delete the series store from.

  • series (str, optional) – The ID of the series to remove from the series store. If None, the entire series store will be deleted.

Return type:

None

remove_session(trainee_id, session)#

Remove a Trainee session.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str) – The identifier of the Trainee session.

Return type:

None

rename_subtrainee(trainee_id, new_name, *, child_id=None, child_name_path=None)#

Renames a contained child trainee in the hierarchy.

Parameters:
  • trainee_id (str) – The ID of the Trainee whose child to rename.

  • new_name (str) – New name of child trainee

  • child_id (str, optional) – Unique id of child trainee to rename. Ignored if child_name_path is specified

  • child_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to rename.

Return type:

None

reset_parameter_defaults(trainee_id)#

Reset Trainee hyperparameters and thresholds.

Parameters:

trainee_id (str) – The identifier of the Trainee.

Return type:

None

retrieve_extreme_cases_for_feature(trainee_id, num, sort_feature, features=None)#

Gets the extreme cases of a Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • num (int) – The number of cases to get.

  • sort_feature (str) – The feature name by which extreme cases are sorted by.

  • features (iterable of str, optional) – An iterable of feature names to use when getting extreme cases.

Returns:

A dictionary of keys ‘cases’ and ‘features’.

Return type:

dict

save_subtrainee(trainee_id, *, filename=None, filepath=None, subtrainee_id=None, trainee_name_path=None)#

Save a subtrainee to disk.

Parameters:
  • trainee_id (str) – The identifier of the Trainee to be modified.

  • filename (str, optional) – The name of the file to save the Trainee to.

  • filepath (str, optional) – The path of the file to save the Trainee to.

  • subtrainee_id (str, optional) – Unique id for subtrainee. Must be provided if subtrainee does not have one already specified.

  • trainee_name_path (list of str, optional) – list of strings specifying the user-friendly path of the child subtrainee to save.

Return type:

None

set_auto_ablation_params(trainee_id, auto_ablation_enabled=False, *, auto_ablation_weight_feature='.case_weight', conviction_lower_threshold=None, conviction_upper_threshold=None, exact_prediction_features=None, influence_weight_entropy_threshold=0.6, minimum_model_size=1000, relative_prediction_threshold_map=None, residual_prediction_features=None, tolerance_prediction_threshold_map=None, **kwargs)#

Set trainee parameters for auto ablation.

Note

Auto-ablation is experimental and the API may change without deprecation.

Parameters:
  • trainee_id (str) – The ID of the Trainee to set auto ablation parameters for.

  • auto_ablation_enabled (bool, default False) – When True, the train() method will ablate cases that meet the set criteria.

  • auto_ablation_weight_feature (str, default ".case_weight") – The weight feature that should be accumulated to when cases are ablated.

  • minimum_model_size (int, default 1,000) – The threshold ofr the minimum number of cases at which the model should auto-ablate.

  • influence_weight_entropy_threshold (float, default 0.6) – The influence weight entropy quantile that a case must be beneath in order to be trained.

  • exact_prediction_features (Optional[List[str]], optional) – For each of the features specified, will ablate a case if the prediction matches exactly.

  • residual_prediction_features (Optional[List[str]], optional) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= feature residual.

  • tolerance_prediction_threshold_map (Optional[Dict[str, Tuple[float, float]]], optional) – For each of the features specified, will ablate a case if the prediction >= (case value - MIN) and the prediction <= (case value + MAX).

  • relative_prediction_threshold_map (Optional[Dict[str, float]], optional) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= relative threshold

  • conviction_lower_threshold (Optional[float], optional) – The conviction value above which cases will be ablated.

  • conviction_upper_threshold (Optional[float], optional) – The conviction value below which cases will be ablated.

set_conviction_lower_threshold(trainee_id, threshold)#

Set the conviction lower threshold.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • threshold (float) – The threshold value.

Return type:

None

set_conviction_upper_threshold(trainee_id, threshold)#

Set the conviction upper threshold.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • threshold (float) – The threshold value.

Return type:

None

set_feature_attributes(trainee_id, feature_attributes)#

Sets feature attributes for a Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • feature_attributes (dict of str to dict) – A dictionary of feature name to dictionary of feature attributes.

Returns:

The updated feature attributes.

Return type:

dict

set_internal_parameters(trainee_id, params)#

Sets specific model parameters in the Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • params (dict) – A dictionary containing the internal parameters.

Return type:

None

set_metadata(trainee_id, metadata)#

Set trainee metadata.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • metadata (dict or None) – The metadata dictionary.

Return type:

None

set_random_seed(trainee_id, seed)#

Sets the random seed for the Trainee.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • seed (int or float or str) – The random seed.

Return type:

None

set_session_metadata(trainee_id, session, metadata)#

Set the Trainee session metadata.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • session (str) – The identifier of the Trainee session.

  • metadata (dict) – The metadata to associate to the session.

Return type:

None

set_substitute_feature_values(trainee_id, substitution_value_map)#

Set substitution feature values used in case generation.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • substitution_value_map (dict or None) – A dictionary of feature name to value to substitution value. If the map is None, all substitutions will be disabled and cleared.

Return type:

None

train(trainee_id, input_cases, features=None, *, accumulate_weight_feature=None, derived_features=None, input_is_substituted=False, series=None, session=None, skip_auto_analyze=False, train_weights_only=False)#

Train one or more cases into a trainee (model).

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • input_cases (list of list of object) – One or more cases to train into the model.

  • features (iterable of str, optional) – An iterable of feature names corresponding to the input cases.

  • accumulate_weight_feature (str, optional) – Name of feature into which to accumulate neighbors’ influences as weight for ablated cases. If unspecified, will not accumulate weights.

  • derived_features (iterable of str, optional) – List of feature names for which values should be derived in the specified order.

  • input_is_substituted (bool, default False) – if True assumes provided nominal feature values have already been substituted.

  • series (str, optional) – Name of the series to pull features and case values from internal series storage.

  • session (str, optional) – The identifier of the Trainee session to associate the cases with.

  • skip_auto_analyze (bool, default False) – When true, the Trainee will not auto-analyze when appropriate. Instead, the response object will contain an “analyze” status when the set auto-analyze parameters indicate that an analyze is needed.

  • train_weights_only (bool, default False) – When true, and accumulate_weight_feature is provided, will accumulate all of the cases’ neighbor weights instead of training the cases into the model.

Return type:

Tuple[Dict, int, int]

Returns:

  • dict – A dictionary containing the trained details.

  • int – The request payload size.

  • int – The result payload size.

classmethod unescape_filename(s)#

Unescape filename.

Return type:

str

upgrade_trainee(trainee_id, path_to_trainee=None, separate_files=False)#

Upgrade a saved Trainee to current version.

Parameters:
  • trainee_id (str) – The identifier of the Trainee.

  • path_to_trainee (Path or str, optional) – The path to where the saved Trainee file is located.

  • separate_files (bool, default False) – Whether to load each case from its individual file.

Return type:

None

class howso.direct.HowsoDirectClient(howso_core=None, *, config_path=None, debug=False, react_initial_batch_size=10, train_initial_batch_size=100, verbose=False, version_check=True, **kwargs)#

Bases: AbstractHowsoClient

The direct Howso client.

A client which provides access to the Howso core endpoints via a direct interface using dynamic libraries.

Parameters:
  • howso_core (howso.direct.HowsoCore, optional) –

    A specified howso core direct interface object.

    If None, a HowsoCore will be initialized.

  • config_path (str or Path or None, optional) –

    A configuration file in yaml format that specifies Howso engine settings.

    If not set, the client will also check in order of precedence:
    • HOWSO_CONFIG environment variable

    • The current directory for howso.yml, howso.yaml, config.yml

    • ~/.howso for howso.yml, howso.yaml, config.yml.

  • debug (bool, default False) – Set debug output.

  • react_initial_batch_size (int, default 10) – The default number of cases to react to in the first batch for calls to HowsoDirectClient.react().

  • train_initial_batch_size (int, default 100) – The default number of cases to train to in the first batch for calls to HowsoDirectClient.train().

  • verbose (bool, default False) – Set verbose output.

  • version_check (bool, default True) – Check if the latest version of Howso engine is installed.

acquire_trainee_resources(trainee_id, *, max_wait_time=None)#

Acquire resources for a trainee in the Howso service.

Parameters:
  • trainee_id (str) – The ID of the Trainee to acquire resources for.

  • max_wait_time (int or float, optional) – (Not implemented) The number of seconds to wait to acquire trainee resources before aborting gracefully.

Raises:

HowsoError – If no Trainee with the requested ID can be found or loaded.

add_feature(trainee_id, feature, feature_value=None, *, condition=None, condition_session=None, feature_attributes=None, overwrite=False)#

Adds a feature to a trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee add the feature to.

  • feature (str) – The name of the feature.

  • feature_attributes (dict, optional) – The dict of feature specific attributes for this feature. If unspecified and conditions are not specified, will assume feature type as ‘continuous’.

  • feature_value (int or float or str, optional) – The value to populate the feature with. By default, populates the new feature with None.

  • condition (dict, optional) –

    A condition map where feature values will only be added when certain criteria is met.

    If None, the feature will be added to all cases in the model and feature metadata will be updated to include it. If specified as an empty dict, the feature will still be added to all cases in the model but the feature metadata will not be updated.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    For instance to add the feature_value only when the length and width features are equal to 10:

    condition = {"length": 10, "width": 10}
    

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id.

  • overwrite (bool, default False) – If True, the feature will be over-written if it exists.

analyze(trainee_id, context_features=None, action_features=None, *, bypass_calculate_feature_residuals=None, bypass_calculate_feature_weights=None, bypass_hyperparameter_analysis=None, dt_values=None, use_case_weights=None, inverse_residuals_as_weights=None, k_folds=None, k_values=None, num_analysis_samples=None, num_samples=None, analysis_sub_model_size=None, analyze_level=None, p_values=None, targeted_model=None, use_deviations=None, weight_feature=None, **kwargs)#

Analyzes a trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • context_features (iterable of str, optional) – The context features to analyze for.

  • action_features (iterable of str, optional) – The action features to analyze for.

  • k_folds (int) – optional, (default 6) number of cross validation folds to do

  • bypass_hyperparameter_analysis (bool) – optional, bypasses hyperparameter analysis

  • bypass_calculate_feature_residuals (bool) – optional, bypasses feature residual calculation

  • bypass_calculate_feature_weights (bool) – optional, bypasses calculation of feature weights

  • use_deviations (bool) – optional, uses deviations for LK metric in queries

  • num_samples (int) – used in calculating feature residuals

  • k_values (list of int) – optional list used in hyperparameter search

  • p_values (list of float) – optional list used in hyperparameter search

  • dt_values (list of float) – optional list used in hyperparameter search

  • analyze_level (int) –

    optional value, if specified, will analyze for the following flows:

    1. predictions/accuracy (hyperparameters)

    2. data synth (cache: global residuals)

    3. standard details

    4. full analysis

  • targeted_model ({"omni_targeted", "single_targeted", "targetless"}) –

    optional, valid values as follows:

    ”single_targeted” = analyze hyperparameters for the

    specified action_features

    ”omni_targeted” = analyze hyperparameters for each context

    feature as an action feature, ignores action_features parameter

    ”targetless” = analyze hyperparameters for all context

    features as possible action features, ignores action_features parameter

  • num_analysis_samples (int, optional) – If the dataset size to too large, analyze on (randomly sampled) subset of data. The num_analysis_samples specifies the number of observations to be considered for analysis.

  • analysis_sub_model_size (int or Node, optional) – Number of samples to use for analysis. The rest will be randomly held-out and not included in calculations.

  • inverse_residuals_as_weights (bool, default is False) – When True will compute and use inverse of residuals as feature weights

  • use_case_weights (bool, default False) – When True will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • kwargs – Additional experimental analyze parameters.

append_to_series_store(trainee_id, series, contexts, *, context_features=None)#

Append the specified contexts to a series store.

For use with train series.

Parameters:
  • trainee_id (str) – The ID of the Trainee to append to.

  • series (str) – The name of the series store to append to.

  • contexts (list of list of object or pandas.DataFrame) – The list of list of context values to append to the series.

  • context_features (iterable of str, optional) – The list of feature names for contexts.

auto_analyze(trainee_id)#

Auto-analyze the trainee model.

Re-uses all parameters from the previous analyze or set_auto_analyze_params call. If analyze or set_auto_analyze_params has not been previously called, auto_analyze will default to a robust and versatile analysis.

Parameters:

trainee_id (str) – The ID of the Trainee to auto-analyze.

begin_session(name='default', metadata=None)#

Begin a new session.

Parameters:
  • name (str, default "default") – The name of the session.

  • metadata (dict, optional) – Any key-value pair to store as custom metadata for the session.

Returns:

The new session instance.

Return type:

howso.openapi.models.Session

Raises:

TypeError – If name is non-None and not a string or metadata is non-None and not a dictionary.

check_name_valid_for_save(file_path, clobber=False)#

Ensure that the given filename is a valid name for the host OS.

Parameters:
  • file_path (Path or str) – The full path of the desired Trainee.

  • clobber (bool, default False) – If True, checks will pass if the file is writable even if it already exists.

Return type:

Tuple[bool, str]

Returns:

  • bool – Return True if the file has a valid filename, is a filepath (not a directory path), that the process (user) has sufficient permissions and, if clobber is False, also that the file does not already exist (optional check).

  • str – The reason. If the return is True, this will be ‘OK’.

check_version()#

Check if there is a more recent version.

Return type:

Optional[str]

compute_feature_weights(trainee_id, action_feature=None, context_features=None, robust=False, weight_feature=None, use_case_weights=False)#

Compute and set feature weights for specified context and action features.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • action_feature (str, optional) – Action feature for which to set the specified feature weights for.

  • context_features (iterable of str) – List of context feature names.

  • robust (bool, default False.) – When true, the power set/permutations of features are used as contexts to calculate the residual for a given feature. When false, the full set of features is used to calculate the residual for a given feature.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

A dictionary of computed context features -> weights

Return type:

dict

copy_subtrainee(trainee_id, new_trainee_name, *, source_id=None, source_name_path=None, target_id=None, target_name_path=None)#

Copy a subtrainee in trainee’s hierarchy.

Parameters:
  • trainee_id (str) – The id of the trainee whose hierarchy is to be modified.

  • new_trainee_name (str) – The name of the new Trainee.

  • source_id (str, optional) – Id of source trainee to copy. Ignored if source_name_path is specified. If neither source_name_path nor source_id are specified, copies the trainee itself.

  • source_name_path (list of str, optional) – list of strings specifying the user-friendly path of the child subtrainee to copy.

  • target_id (str, optional) – Id of target trainee to copy trainee into. Ignored if target_name_path is specified. If neither target_name_path nor target_id are specified, copies as a direct child of trainee.

  • target_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to copy trainee into.

Return type:

None

copy_trainee(trainee_id, new_trainee_name=None, new_trainee_id=None, *, library_type=None, resources=None)#

Copies a trainee to a new trainee id in the Howso service.

Parameters:
  • trainee_id (str) – The trainee id of the trainee to be copied.

  • new_trainee_name (str, optional) – The name of the new Trainee.

  • new_trainee_id (str, optional) –

    The id of the new Trainee.

    If not provided, the id will be set to new_trainee_name (if provided), otherwise a new uuid4.

  • library_type (str, optional) – (Not Implemented) The library type of the Trainee. If not specified, the new trainee will inherit the value from the original.

  • resources (howso.openapi.models.TraineeResources or dict, optional) – (Not Implemented) Customize the resources provisioned for the Trainee instance. If not specified, the new trainee will inherit the value from the original.

Returns:

The Trainee object that was created.

Return type:

Trainee

Raises:

ValueError – If the Trainee could not be copied.

create_trainee(trainee, *, library_type=None, max_wait_time=None, overwrite_trainee=False, resources=None)#

Create a Trainee on the Howso service.

A Trainee can be thought of as “model” in traditional ML sense.

Parameters:
  • trainee (Trainee) – A Trainee object defining the Trainee.

  • library_type ({"st", "mt"}, optional) – (Not implemented) The library type of the Trainee.

  • max_wait_time (int or float, default 30) – (Not implemented) The number of seconds to wait for a trainee to be created before aborting gracefully.

  • overwrite_trainee (bool, default False) – If True, and if a trainee with id trainee.id already exists, the given trainee will delete the old trainee and create the new trainee.

  • resources (howso.openapi.models.TraineeResources or dict, optional) – (Not implemented) Customize the resources provisioned for the Trainee instance.

Returns:

The Trainee object that was created.

Return type:

Trainee

delete_trainee(trainee_id=None, file_path=None)#

This deletes the Trainee.

Includes all cases, model metadata, session data, persisted files, etc.

Parameters:
  • trainee_id (str, optional) – The ID of the Trainee. If full filepath with is provided, trainee_id will only be used to delete from core.

  • file_path (Path or str, optional) –

    The path of the file to load the Trainee from. Used for deleting trainees from disk.

    The file path must end with a filename, but file path can be either an absolute path, a relative path or just the file name.

    If trainee_id is not provided, in addition to deleting from disk, will attempt to delete a Trainee from memory assuming the Trainee has the same name as the filename.

    If file_path is a relative path the absolute path will be computed appending the file_path to the CWD.

    If file_path is an absolute path, this is the absolute path that will be used.

    If file_path is just a filename, then the absolute path will be computed appending the filename to the CWD.

delete_trainee_session(trainee_id, session)#

Deletes a session from a trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee to delete the session from.

  • session (str) – The id of the session to remove.

edit_cases(trainee_id, feature_values, *, case_indices=None, condition=None, condition_session=None, features=None, num_cases=None, precision=None)#

Edit feature values for the specified cases.

Parameters:
  • trainee_id (str) – The ID of the Trainee to edit the cases of.

  • feature_values (list of object or pandas.DataFrame) – The feature values to edit the case(s) with. If specified as a list, the order corresponds with the order of the features parameter. If specified as a DataFrame, only the first row will be used.

  • case_indices (Iterable of Sequence[Union[str, int]], optional) – Iterable of Sequences containing the session id and index, where index is the original 0-based index of the case as it was trained into the session. This explicitly specifies the cases to edit. When specified, condition and condition_session are ignored.

  • condition (dict, optional) –

    A condition map to select which cases to edit. Ignored when case_indices are specified.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • condition_session (str, optional) – If specified, ignores the condition and operates on all cases for the specified session.

  • features (iterable of str, optional) – The names of the features to edit. Required when feature_values is not specified as a DataFrame.

  • num_cases (int, default None) – The maximum amount of cases to edit. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases, defaults to “exact”.

Returns:

The number of cases modified.

Return type:

int

evaluate(trainee_id, features_to_code_map, *, aggregation_code=None)#

Evaluate custom code on feature values of all cases in the trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • features_to_code_map (dict of str to str) –

    A dictionary with feature name keys and custom Amalgam code string values.

    The custom code can use “#feature_name 0” to reference the value of that feature for each case.

  • aggregation_code (str, optional) – A string of custom Amalgam code that can access the list of values derived form the custom code in features_to_code_map. The custom code can use “#feature_name 0” to reference the list of values derived from using the custom code in features_to_code_map.

Returns:

A dictionary with keys: ‘evaluated’ and ‘aggregated’

’evaluated’ is a dictionary with feature name keys and lists of values derived from the features_to_code_map custom code.

’aggregated’ is None if no aggregation_code is given, it otherwise holds the output of the custom ‘aggregation_code’

Return type:

dict

execute_label(entity_id, label)#

Execute a label in the trainee.

Parameters:
  • entity_id (str) – The ID of the Trainee that contains the label to be executed.

  • label (str) – The name of the label to execute.

Return type:

object

Returns:

The raw response from the trainee.

export_trainee(trainee_id, path_to_trainee=None, decode_cases=False, separate_files=False)#

Export a saved Trainee’s data to json files for migration.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • path_to_trainee (Path or str, optional) – The path to where the saved trainee file is located.

  • decoded_cases (bool, default False.) – Whether to export decoded cases.

  • separate_files (bool, default False) – Whether to load each case from its individual file.

get_auto_ablation_params(trainee_id)#

Get parameters set by set_auto_ablation_params().

get_cases(trainee_id, session=None, case_indices=None, indicate_imputed=False, features=None, condition=None, num_cases=None, precision=None)#

Retrieve cases from a model given a trainee id.

Parameters:
  • trainee_id (str) – The ID of the Trainee retrieve cases from.

  • session (str, optional) –

    The session ID to retrieve cases for, in their trained order.

    NOTE: If a session is not provided, retrieves all feature values

    for cases for all (unordered) sessions in the order they were trained within each session.

  • case_indices (iterable of sequence of str, int, optional) – Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns only these cases and ignores the session parameter.

  • indicate_imputed (bool, default False) – If set, an additional value will be appended to the cases indicating if the case was imputed.

  • features (iterable of str, optional) –

    A list of feature names to return values for in leu of all default features.

    Built-in features that are available for retrieval:

    .session - The session id the case was trained under.
    .session_training_index - 0-based original index of the case, ordered by training during the session; is never changed.

  • condition (dict, optional) –

    The condition map to select the cases to retrieve that meet all the provided conditions.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    Example 1 - Retrieve all values belonging to feature_name:

    criteria = {"feature_name": None}
    

    Example 2 - Retrieve cases that have the value 10:

    criteria = {"feature_name": 10}
    

    Example 3 - Retrieve cases that have a value in range [10, 20]:

    criteria = {"feature_name": [10, 20]}
    

    Example 4 - Retrieve cases that match one of [‘a’, ‘c’, ‘e’]:

    condition = {"feature_name": ['a', 'c', 'e']}
    

    Example 5 - Retrieve cases using session name and index:

    criteria = {'.session':'your_session_name',
                '.session_training_index': 1}
    

  • num_cases (int, default None) – The maximum amount of cases to retrieve. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision ({"exact", "similar}, optional) – The precision to use when retrieving the cases via condition. Options are “exact” or “similar”. If not provided, “exact” will be used.

Returns:

A cases object containing the feature names and cases.

Return type:

howso.openapi.models.Cases

get_distances(trainee_id, features=None, *, action_feature=None, case_indices=None, feature_values=None, use_case_weights=False, weight_feature=None)#

Compute distances matrix for specified cases.

Returns a dict with computed distances between all cases specified in case_indices or from all cases in local model as defined by feature_values. If neither case_indices nor feature_values is specified, returns computed distances for the entire dataset.

Parameters:
  • trainee_id (str) – The trainee ID.

  • features (iterable of str, optional) – List of feature names to use when computing distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • case_indices (Iterable of Sequence[Union[str, int]], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns distances for all of these cases. Ignored if feature_values is provided. If neither feature_values nor case_indices is specified, uses full dataset.

  • feature_values (list of object or DataFrame, optional) – If specified, returns distances of the local model relative to these values, ignores case_indices parameter. If provided a DataFrame, only the first row will be used.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A dict containing a matrix of computed distances and the list of corresponding case indices in the following format:

{
    'case_indices': [ session-indices ],
    'distances': [ [ distances ] ]
}

Return type:

dict

get_entities()#

Return a list of loaded core entities.

Returns:

The list of loaded entity names.

Return type:

iterable of str

get_extreme_cases(trainee_id, num, sort_feature, features=None)#

Gets the extreme cases of a trainee for the given feature(s).

Parameters:
  • trainee_id (str) – The ID of the Trainee to retrieve extreme cases from.

  • num (int) – The number of cases to get.

  • sort_feature (str) – The feature name by which extreme cases are sorted by.

  • features (iterable of str, optional) – An iterable of feature names to use when getting extreme cases.

Returns:

A cases object containing the feature names and extreme cases.

Return type:

howso.openapi.models.Cases

get_feature_attributes(trainee_id)#

Get stored feature attributes.

Parameters:

trainee_id (str) – The ID of the Trainee

Returns:

A dictionary of feature name to dictionary of feature attributes.

Return type:

dict

get_feature_contributions(trainee_id, action_feature, *, robust=None, directional=False, weight_feature=None)#

Get cached feature contributions.

All keyword arguments are optional. When not specified, will auto-select cached contributions for output. When specified, will attempt to output the cached contributions best matching the requested parameters, if no cached match is found.

Deprecated since version 1.0.0: Use HowsoDirectClient.get_prediction_stats() instead.

Parameters:
  • trainee_id (str) – The id or name of the trainee.

  • action_feature (str) – Will attempt to return contributions that were computed for this specified action_feature.

  • robust (bool, optional) – When specified, will attempt to return contributions that were computed with the specified robust or non-robust type.

  • directional (bool, default False) – If false returns absolute feature contributions. If true, returns directional feature contributions.

  • weight_feature (str, optional) – When specified, will attempt to return contributions that were computed using this weight_feature.

Returns:

A map of feature names to contribution values.

Return type:

dict of str to float

get_feature_conviction(trainee_id, *, features=None, action_features=None, familiarity_conviction_addition=True, familiarity_conviction_removal=False, weight_feature=None, use_case_weights=False)#

Get familiarity conviction for features in the model.

Parameters:
  • trainee_id (str) – The id of the trainee.

  • features (iterable of str, optional) – An iterable of feature names to calculate convictions. At least 2 features are required to get familiarity conviction. If not specified all features will be used.

  • action_features (iterable of str, optional) – An iterable of feature names to be treated as action features during conviction calculation in order to determine the conviction of each feature against the set of action_features. If not specified, conviction is computed for each feature against the rest of the features as a whole.

  • familiarity_conviction_addition (bool, default True) – Calculate and output familiarity conviction of adding the specified features in the output.

  • familiarity_conviction_removal (bool, default False) – Calculate and output familiarity conviction of removing the specified features in the output.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

A dict with familiarity_conviction_addition or familiarity_conviction_removal

Return type:

dict

get_feature_mda(trainee_id, action_feature, *, permutation=None, robust=None, weight_feature=None)#

Get cached feature Mean Decrease In Accuracy (MDA).

All keyword arguments are optional, when not specified will auto-select cached MDA for output, when specified will attempt to output the cached MDA best matching the requested parameters, if no cached match is found.

Deprecated since version 1.0.0: Use HowsoDirectClient.get_prediction_stats() instead.

Parameters:
  • trainee_id (str) – The id or name of the trainee.

  • action_feature (str) – Will attempt to return MDA that was computed for this specified action_feature.

  • permutation (bool, optional) – When False, will attempt to return MDA that was computed by dropping each feature. When True will attempt to return MDA that was computed with permutations by scrambling each feature.

  • robust (bool, optional) – When specified, will attempt to return MDA that was computed with the specified robust or non-robust type.

  • weight_feature (str, optional) – When specified, will attempt to return MDA that was computed using this weight_feature.

Returns:

A map of feature names to MDA values.

Return type:

dict of str to float

get_feature_residuals(trainee_id, *, action_feature=None, robust=None, robust_hyperparameters=None, weight_feature=None)#

Get cached feature residuals.

All keyword arguments are optional, when not specified will auto-select cached residuals for output, when specified will attempt to output the cached residuals best matching the requested parameters, if no cached match is found.

Deprecated since version 1.0.0: Use HowsoDirectClient.get_prediction_stats() instead.

Parameters:
  • trainee_id (str) – The id or name of the trainee.

  • action_feature (str, optional) – When specified, will attempt to return residuals that were computed for this specified action_feature. Note: “.targetless” is the action feature used during targetless analysis.

  • robust (bool, optional) – When specified, will attempt to return residuals that were computed with the specified robust or non-robust type.

  • robust_hyperparameters (bool, optional) – When specified, will attempt to return residuals that were computed using hyperpparameters with the specified robust or non-robust type.

  • weight_feature (str, optional) – When specified, will attempt to return residuals that were computed using this weight_feature.

Returns:

A map of feature names to residual values.

Return type:

dict of str to float

get_hierarchy(trainee_id)#

Output the hierarchy for a trainee.

Parameters:

trainee_id (str) – The ID of the Trainee get hierarchy from.

Returns:

dict of {str – Dictionary of the currently contained hierarchy as a nested dict with False for trainees that are stored independently.

Return type:

dict}

get_label(entity_id, label)#

Get a label value from a Trainee.

Parameters:
  • entity_id (str) – The ID of the Trainee to get the label from.

  • label (str) – The label name to get the value from.

Returns:

The value of the label requested.

Return type:

object

get_marginal_stats(trainee_id, *, condition=None, num_cases=None, precision=None, weight_feature=None)#

Get marginal stats for all features.

Parameters:
  • trainee_id (str) – The ID of the Trainee to retrieve marginal stats for.

  • condition (dict or None, optional) –

    A condition map to select which cases to compute marginal stats for.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • num_cases (int, default None) – The maximum amount of cases to use to calculate marginal stats. If not specified, the limit will be k cases if precision is “similar”. Only used if condition is not None.

  • precision (str, default None) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used if condition is not None.

  • weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A map of feature names to map of stat type to stat values.

Return type:

dict of str to dict of str to float

get_num_training_cases(trainee_id)#

Return the number of trained cases in the model.

Parameters:

trainee_id (str) – The Id of the Trainee to retrieve the number of training cases from.

Returns:

The number of cases in the model

Return type:

int

get_pairwise_distances(trainee_id, features=None, *, action_feature=None, from_case_indices=None, from_values=None, to_case_indices=None, to_values=None, use_case_weights=False, weight_feature=None)#

Compute pairwise distances between specified cases.

Returns a list of computed distances between each respective pair of cases specified in either from_values or from_case_indices to to_values or to_case_indices. If only one case is specified in any of the lists, all respective distances are computed to/from that one case.

Note

  • One of from_values or from_case_indices must be specified, not both.

  • One of to_values or to_case_indices must be specified, not both.

Parameters:
  • trainee_id (str) – The trainee ID.

  • features (iterable of str, optional) – List of feature names to use when computing pairwise distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • from_case_indices (Iterable of Sequence[Union[str, int]], optional) – An iterable of sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • from_values (list of list of object or pandas.DataFrame, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • to_case_indices (Iterable of Sequence[Union[str, int]], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • to_values (list of list of object or pandas.DataFrame, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A list of computed pairwise distances between each corresponding pair of cases in from_case_indices and to_case_indices.

Return type:

list

get_params(trainee_id, *, action_feature=None, context_features=None, mode=None, weight_feature=None)#

Get the parameters used by the Trainee.

If ‘action_feature’, ‘context_features’, ‘mode’, or ‘weight_feature’ are specified, then the best hyperparameters analyzed in the Trainee are the value of the ‘hyperparameter_map’ key, otherwise this value will be the dictionary containing all the hyperparameter sets in the Trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee get parameters from.

  • action_feature (str, optional) – If specified will return the best analyzed hyperparameters to target this feature.

  • context_features (str, optional) – If specified, will find and return the best analyzed hyperparameters to use with these context features.

  • mode (str, optional) – If specified, will find and return the best analyzed hyperparameters that were computed in this mode.

  • weight_feature (str, optional) – If specified, will find and return the best analyzed hyperparameters that were analyzed using this weight feaure.

Returns:

A dict including the either all of the Trainee’s internal parameters or only the best hyperparameters selected using the passed parameters.

Return type:

dict

get_prediction_stats(trainee_id, *, action_feature=None, condition=None, num_cases=None, num_robust_influence_samples_per_case=None, precision=None, robust=None, robust_hyperparameters=None, stats=None, weight_feature=None)#

Get feature prediction stats.

Gets cached stats when condition is None. If condition is not None, then uses the condition to select cases and computes prediction stats for that set of cases.

All keyword arguments are optional, when not specified will auto-select all cached stats for output, when specified will attempt to output the cached stats best matching the requested parameters, if no cached match is found.

Parameters:
  • trainee_id (str) – The id or name of the trainee.

  • action_feature (str, optional) –

    When specified, will attempt to return stats that were computed for this specified action_feature. Note: “.targetless” is the action feature used during targetless analysis.

    Note

    If get_prediction_stats is being used with time series analysis, the action feature for which the prediction statistics information is desired must be specified.

  • condition (dict or None, optional) –

    A condition map to select which cases to compute prediction stats for.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • num_cases (int, default None) – The maximum amount of cases to use to calculate prediction stats. If not specified, the limit will be k cases if precision is “similar”, or 1000 cases if precision is “exact”. Only used if condition is not None.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • precision (str, default None) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used if condition is not None.

  • robust (bool, optional) – When specified, will attempt to return stats that were computed with the specified robust or non-robust type.

  • robust_hyperparameters (bool, optional) – When specified, will attempt to return stats that were computed using hyperparameters with the specified robust or non-robust type.

  • stats (iterable of str, optional) –

    List of stats to output. When unspecified, returns all. Allowed values:

    • accuracy : The number of correct predictions divided by the total number of predictions.

    • confusion_matrix : A sparse map of actual feature value to a map of predicted feature value to counts.

    • contribution : Feature contributions to predicted value when each feature is dropped from the model, applies to all features.

    • mae : Mean absolute error. For continuous features, this is calculated as the mean of absolute values of the difference between the actual and predicted values. For nominal features, this is 1 - the average categorical action probability of each case’s correct classes. Categorical action probabilities are the probabilities for each class for the action feature.

    • mda : Mean decrease in accuracy when each feature is dropped from the model, applies to all features.

    • mda_permutation : Mean decrease in accuracy that used scrambling of feature values instead of dropping each feature, applies to all features.

    • missing_value_accuracy : The number of cases with missing values predicted to have missing values divided by the number of cases with missing values, applies to all features that contain missing values.

    • precision : Precision (positive predictive) value for nominal features only.

    • r2 : The r-squared coefficient of determination, for continuous features only.

    • recall : Recall (sensitivity) value for nominal features only.

    • rmse : Root mean squared error, for continuous features only.

    • spearman_coeff : Spearman’s rank correlation coefficient, for continuous features only.

    • mcc : Matthews correlation coefficient, for nominal features only.

  • weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A map of feature to map of stat type to stat values.

Return type:

dict of str to dict of str to float

get_session(session_id)#

Retrieve a session.

Note

If multiple trainees are loaded, the session will be retrieved from the most recently loaded trainee that contains the requested session. (The metadata will include the trainee_id from which the session was retrieved from)

Parameters:

session_id (str) – The id of the session to retrieve.

Returns:

The session instance.

Return type:

howso.openapi.models.Session

get_sessions(search_terms=None)#

Return a list of all accessible sessions.

Note

Returns sessions from across all loaded trainees. (The metadata will include the trainee_id from which the session was retrieved from)

Parameters:

search_terms (str, optional) – Space or comma delimited search terms to filter results by.

Returns:

The listing of session instances.

Return type:

list of howso.openapi.models.Session

get_substitute_feature_values(trainee_id, clear_on_get=True)#

Gets a substitution map for use in extended nominal generation.

Parameters:
  • trainee_id (str) – The ID of the Trainee to get the substitution feature values from.

  • clear_on_get (bool, default True) – Clears the substitution values map in the Trainee upon retrieving them. This is done if it is desired to prevent the substitution map from being persisted. If set to False the model will not be cleared which preserves substitution mappings if the model is saved; representing a potential privacy leak should the substitution map be made public.

Returns:

A dictionary of feature name to a dictionary of feature value to substitute feature value.

Return type:

dict of dict

get_trainee(trainee_id)#

Gets a trainee loaded in the Howso service.

Parameters:

trainee_id (str) – The id of the trainee.

Returns:

A Trainee object representing the Trainee.

Return type:

Trainee

get_trainee_information(trainee_id)#

Get information about the trainee.

Including trainee version and configuration parameters.

Parameters:

trainee_id (str) – The ID of the Trainee.

Returns:

The Trainee information.

Return type:

howso.openapi.models.TraineeInformation

get_trainee_metrics(trainee_id)#

This endpoint is not implemented for the direct Howso client.

Raises:

NotImplementedError – This endpoint is not implemented for the direct Howso client.

Return type:

Never

get_trainee_session_indices(trainee_id, session)#

Get list of all session indices for a specified session.

Parameters:
  • trainee_id (str) – The ID of the Trainee get parameters from.

  • session (str) – The id of the session to retrieve indices from.

Returns:

A list of the session indices for the session.

Return type:

list of int

get_trainee_session_training_indices(trainee_id, session)#

Get list of all session training indices for a specified session.

Parameters:
  • trainee_id (str) – The ID of the Trainee get parameters from.

  • session (str) – The id of the session to retrieve indices from.

Returns:

A list of the session training indices for the session.

Return type:

list of int

get_trainee_sessions(trainee_id)#

Get the sessions of a trainee.

Parameters:

trainee_id (str) – The ID of the Trainee to get the list of sessions from.

Returns:

A list of dicts with keys “id” and “name” for each session in the Trainee.

Return type:

list of dict of str to str

Examples

>>> print(cl.get_trainee_sessions(trainee.id))
[{'id': '6c35e481-fb49-4178-a96f-fe4b5afe7af4', 'name': 'default'}]
get_trainees(search_terms=None)#

Return a list of all trainees.

Parameters:

search_terms (str) – Keywords to filter trainee list by.

Returns:

A list of the trainee identities.

Return type:

list of howso.openapi.models.TraineeIdentity

static get_unique_handle(handle)#

Append a unique 6 byte hex to the input handle.

Parameters:

handle (str) – String to which a unique 6 byte hex string will appended.

Returns:

A unique alphanumeric handle consisting of the input string and a unique 6 byte hex string.

Return type:

str

get_version()#

Return the Howso version.

Returns:

A version response that contains the version data for the current instance of Howso.

Return type:

howso.openapi.models.ApiVersion

impute(trainee_id, features=None, features_to_impute=None, batch_size=1)#

Impute, or fill in the missing values, for the specified features.

If no ‘features’ are specified, will use all features in the trainee for imputation. If no ‘features_to_impute’ are specified, will impute all features specified by ‘features’.

Parameters:
  • trainee_id (str) – The ID of the Trainee to impute.

  • features (iterable of str, optional) –

    An iterable of feature names to use for imputation.

    If not specified, all features will be used imputed.

  • features_to_impute (iterable of str, optional) – An iterable of feature names to impute If not specified, features will be used (see above)

  • batch_size (int, default 1) –

    Larger batch size will increase accuracy and decrease speed. Batch size indicates how many rows to fill before recomputing conviction.

    The default value (which is 1) should return the best accuracy but might be slower. Higher values should improve performance but may decrease accuracy of results.

load_trainee(trainee_id)#

Load a Trainee that was persisted on the Howso service.

Deprecated since version 1.0.0: Use HowsoDirectClient.acquire_trainee_resources() instead.

Parameters:

trainee_id (str) – The ID of the Trainee load.

move_cases(trainee_id, num_cases, *, case_indices=None, condition=None, condition_session=None, precision=None, preserve_session_data=False, source_id=None, source_name_path=None, target_name_path=None, target_id=None)#

Moves training cases from one trainee to another in the hierarchy.

Parameters:
  • trainee_id (str) – The identifier of the Trainee doing the moving.

  • num_cases (int) – The number of cases to move; minimum 1 case must be moved. Ignored if case_indices is specified.

  • case_indices (list of tuples) – A list of tuples containing session ID and session training index for each case to be removed.

  • condition (dict, optional) –

    The condition map to select the cases to move that meet all the provided conditions. Ignored if case_indices is specified.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    Example 1 - Move all values belonging to feature_name:

    criteria = {"feature_name": None}
    

    Example 2 - Move cases that have the value 10:

    criteria = {"feature_name": 10}
    

    Example 3 - Move cases that have a value in range [10, 20]:

    criteria = {"feature_name": [10, 20]}
    

    Example 4 - Remove cases that match one of [‘a’, ‘c’, ‘e’]:

    condition = {"feature_name": ['a', 'c', 'e']}
    

    Example 5 - Move cases using session name and index:

    criteria = {'.session':'your_session_name',
                '.session_index': 1}
    

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id. Ignored if case_indices is specified.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases. Options are ‘exact’ or ‘similar’. If not specified, “exact” will be used. Ignored if case_indices is specified.

  • preserve_session_data (bool, default False) – When True, will move cases without cleaning up session data.

  • source_id (str, optional) – The source trainee unique id from which to move cases. Ignored if source_name_path is specified. If neither source_name_path nor source_id are specified, moves cases from the trainee itself.

  • source_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee from which to move cases.

  • target_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to move cases to.

  • target_id (str, optional) – The target trainee id to move the cases to. Ignored if target_name_path is specified. If neither target_name_path nor target_id are specified, moves cases to the trainee itself.

Returns:

The number of cases moved.

Return type:

int

persist_trainee(trainee_id)#

Persists a Trainee in the Howso service storage.

After persisting, the Trainee resources can be released.

Parameters:

trainee_id (str) – The ID of the Trainee to persist.

Raises:

AssertionError – If the requested Trainee’s persistence is set to “never”.

react(trainee_id, *, action_features=None, actions=None, allow_nulls=False, batch_size=None, case_indices=None, contexts=None, context_features=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, feature_bounds_map=None, generate_new_cases='no', initial_batch_size=None, input_is_substituted=False, into_series_store=None, leave_case_out=False, new_case_threshold='min', num_cases_to_generate=1, ordered_by_specified_features=False, post_process_features=None, post_process_values=None, preserve_feature_values=None, progress_callback=None, substitute_output=True, suppress_warning=False, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

React to supplied values and cases contained within the Trainee.

If desired_conviction is not specified, executes a discriminative react: provided a list of context values, the trainee reacts to the model and produces predictions for the specified actions. If desired_conviction is specified, executes a generative react, produces action_values for the specified action_features conditioned on the optionally provided contexts.

Parameters:
  • trainee_id (str) – The ID of the Trainee to react to.

  • contexts (list of list of object or DataFrame, optional) –

    A 2d list of context values to react to. If None for discriminative react, it is assumed that session and session_id keys are set in the details.

    >>> contexts = [[1, 2, 3], [4, 5, 6]]
    

  • action_features (iterable of str, optional) –

    An iterable of feature names to treat as action features during react.

    >>> action_features = ['rain_chance', 'is_sunny']
    

  • actions (list of list of object or DataFrame, optional) –

    One or more action values to use for action features. If specified, will only return the specified explanation details for the given actions. (Discriminative reacts only)

    >>> actions = [[1, 2, 3], [4, 5, 6]]
    

  • allow_nulls (bool, default False) – When true will allow return of null values if there are nulls in the local model for the action features, applicable only to discriminative reacts.

  • batch_size (int, optional) – Define the number of cases to react to at once. If left unspecified, the batch size will be determined automatically.

  • context_features (iterable of str, optional) –

    An iterable of feature names to treat as context features during react.

    >>> context_features = ['temperature', 'humidity', 'dew_point',
    ...                     'barometric_pressure']
    

  • derived_context_features (iterable of str, optional) – An iterable of feature names whose values should be computed from the provided context in the specified order. Must be different than context_features.

  • derived_action_features (iterable of str, optional) –

    An iterable of feature names whose values should be computed after generation from the generated case prior to output, in the specified order. Must be a subset of action_features.

    Note

    Both of these derived feature lists rely on the features’ “derived_feature_code” attribute to compute the values. If ‘derived_feature_code’ attribute is undefined or references non-0 feature indices, the derived value will be null.

  • input_is_substituted (bool, default False) – if True assumes provided categorical (nominal or ordinal) feature values have already been substituted.

  • substitute_output (bool, default True) – If False, will not substitute categorical feature values. Only applicable if a substitution value map has been set.

  • details (dict, optional) –

    If details are specified, the response will contain the requested explanation data along with the reaction. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the audit data returned.

    • influential_casesbool, optional

      If True outputs the most influential cases and their influence weights based on the surprisal of each case relative to the context being predicted among the cases. Uses only the context features of the reacted case.

    • influential_cases_familiarity_convictionsbool, optional

      If True outputs familiarity conviction of addition for each of the influential cases.

    • influential_cases_raw_weightsbool, optional

      If True outputs the surprisal for each of the influential cases.

    • hypothetical_valuesdict, optional

      A dictionary of feature name to feature value. If specified, shows how a prediction could change in a what-if scenario where the influential cases’ context feature values are replaced with the specified values. Iterates over all influential cases, predicting the action features each one using the updated hypothetical values. Outputs the predicted arithmetic over the influential cases for each action feature.

    • most_similar_casesbool, optional

      If True outputs an automatically determined (when ‘num_most_similar_cases’ is not specified) relevant number of similar cases, which will first include the influential cases. Uses only the context features of the reacted case.

    • num_most_similar_casesint, optional

      Outputs this manually specified number of most similar cases, which will first include the influential cases.

      NOTE: The maximum number of cases that can be queried is 1000.

    • num_most_similar_case_indicesint, optional

      Outputs the specified number of most similar case indices when ‘distance_ratio’ is also set to True.

      NOTE: The maximum number of cases that can be queried is ‘1000’.

    • num_robust_influence_samples_per_caseint, optional

      Specifies the number of robust samples to use for each case. Applicable only for computing robust feature contributions or robust case feature contributions. Defaults to 2000. Higher values will take longer but provide more stable results.

    • boundary_casesbool, optional

      If True outputs an automatically determined (when ‘num_boundary_cases’ is not specified) relevant number of boundary cases. Uses both context and action features of the reacted case to determine the counterfactual boundary based on action features, which maximize the dissimilarity of action features while maximizing the similarity of context features. If action features aren’t specified, uses familiarity conviction to determine the boundary instead.

    • num_boundary_casesint, optional

      Outputs this manually specified number of boundary cases.

      NOTE: The maximum number of cases that can be queried is ‘1000’.

    • boundary_cases_familiarity_convictionsbool, optional

      If True outputs familiarity conviction of addition for each of the boundary cases.

    • distance_ratiobool, optional

      If True outputs the ratio of distance (relative surprisal) between this reacted case and its nearest case to the minimum distance (relative surprisal) in between the closest two cases in the local area. All distances are computed using only the specified context features.

    • distance_contributionbool, optional

      If True outputs the distance contribution (expected total surprisal contribution) for the reacted case. Uses both context and action feature values.

    • similarity_convictionbool, optional

      If True outputs similarity conviction for the reacted case. Uses both context and action feature values as the case values for all computations. This is defined as expected (local) distance contribution divided by reacted case distance contribution.

    • outlying_feature_valuesbool, optional

      If True outputs the reacted case’s context feature values that are outside the min or max of the corresponding feature values of all the cases in the local model area. Uses only the context features of the reacted case to determine that area.

    • categorical_action_probabilitiesbool, optional

      If True outputs probabilities for each class for the action. Applicable only to categorical action features.

    • derivation_parametersbool, optional

      If True, outputs a dictionary of the parameters used in the react call. These include k, p, distance_transform, feature_weights, feature_deviations, nominal_class_counts, and use_irw.

      • k: the number of cases used for the local model.

      • p: the parameter for the Lebesgue space.

      • distance_transform: the distance transform used as an exponent to convert distances to raw influence weights.

      • feature_weights: the weight for each feature used in the distance metric.

      • feature_deviations: the deviation for each feature used in the distance metric.

      • nominal_class_counts: the number of unique values for each nominal feature. This is used in the distance metric.

      • use_irw: a flag indicating if feature weights were derived using inverse residual weighting.

    • observational_errorsbool, optional

      If True outputs observational errors for all features as defined in feature attributes.

    • robust_computationbool, optional

      Deprecated. If specified, will overwrite the value of both ‘robust_residuals’ and ‘robust_influences’.

    • robust_residualsbool, optional

      Default is false, uses leave-one-out for features (or cases, as needed) for all residual computations. When true, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.

    • robust_influencesbool, optional

      Default is true, uses leave-one-out for features (or cases, as needed) for all MDA and contribution computations. When true, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.

    • featureslist of str, optional

      A list of feature names that specifies for what features will per-feature details be computed (residuals, contributions, mda, etc.). This should generally preserve compute, but will not when computing details robustly. Details will be computed for all context and action features if this value is not specified.

    • feature_residualsbool, optional

      If True outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • feature_mdabool, optional

      If True outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • feature_mda_ex_postbool, optional

      If True outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation detail given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • feature_contributionsbool, optional

      If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation. Directional feature contributions are returned under the key ‘directional_feature_contributions’.

    • case_feature_contributionsbool, optional

      If True outputs each context feature’s absolute and directional differences between the predicted action feature value and the predicted action feature value if each context feature were not in the model for all context features in this case, using only the values from this specific case. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation. Directional case feature contributions are returned under the ‘case_directional_feature_contributions’ key.

    • case_mdabool, optional

      If True outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • case_contributionsbool, optional

      If True outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_influences’ parameter to determine whether to do standard or robust computation.

    • case_feature_residualsbool, optional

      If True outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • local_case_feature_residual_convictionsbool, optional

      If True outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • global_case_feature_residual_convictionsbool, optional

      If True outputs this case’s feature residual convictions for the global model. Computed as: global model feature residual divided by case feature residual. Relies on ‘robust_residuals’ parameter to determine whether to do standard or robust computation.

    • generate_attemptsbool, optional

      If True outputs the number of attempts taken to generate each case. Only applicable when ‘generate_new_cases’ is “always” or “attempt”.

    >>> details = {'num_most_similar_cases': 5,
    ...            'feature_residuals': True}
    

  • desired_conviction (float) – If specified will execute a generative react. If not specified will executed a discriminative react. Conviction is the ratio of expected surprisal to generated surprisal for each feature generated, valid values are in the range of \((0, \\infty)\).

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • case_indices (Iterable of Sequence[Union[str, int]], defaults to None) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null, generative react ignores it.

  • preserve_feature_values (iterable of str) – List of features that will preserve their values from the case specified by case_indices, appending and overwriting the specified contexts as necessary. For generative reacts, if case_indices isn’t specified will preserve feature values of a random case.

  • leave_case_out (bool, default False) – If set to True and specified along with case_indices, each individual react will respectively ignore the corresponding case specified by case_indices by leaving it out.

  • initial_batch_size (int, optional) – Define the number of cases to react to in the first batch. If unspecified, the value of the react_initial_batch_size property is used. The number of cases in following batches will be automatically adjusted. This value is ignored if batch_size is specified.

  • into_series_store (str, optional) – The name of a series store. If specified, will store an internal record of all react contexts for this session and series to be used later with train series.

  • use_regional_model_residuals (bool) – If false uses model feature residuals, if True recalculates regional model residuals.

  • feature_bounds_map (dict of dict) –

    A mapping of feature names to the bounds for the feature values to be generated in. For continuous features this should be a numeric value, for datetimes this should be a datetime string. Min bounds should be equal to or smaller than max bounds, except when setting the bounds around the cycle length of a cyclic feature.(e.g., to allow 0 +/- 60 degrees, set min=300 and max=60).

    Example feature bounds map:#
    {
        "feature_a": {"min": 0},
        "feature_b" : {"min": 1, "max": 5},
        "feature_c": {"max": 1}
    }
    

  • generate_new_cases ({"always", "attempt", "no"}, default "no") –

    (Optional) Whether to generate new cases.

    This parameter takes in a string equal to one of the following:

    1. ”attempt”

      Synthesizer attempts to generate new cases and if its not possible to generate a new case, it might generate cases in “no” mode (see point c.)

    2. ”always”

      Synthesizer always generates new cases and if its not possible to generate a new case, it returns None.

    3. ”no”

      Synthesizer generates data based on the desired_conviction specified and the generated data is not guaranteed to be a new case (that is, a case not found in original dataset.)

  • ordered_by_specified_features (bool, default False) – If True order of generated feature values will match the order of specified features.

  • num_cases_to_generate (int, default 1) – The number of cases to generate.

  • suppress_warning (bool, defaults to False) – If True, warnings will not be displayed.

  • post_process_features (iterable of str, optional) – List of feature names that will be made available during the execution of post_process feature attributes.

  • post_process_values (list of list of object or DataFrame, optional) – A 2d list of values corresponding to post_process_features that will be made available during the execution of post_process feature attributes.

  • progress_callback (callable, optional) – A callback method that will be called before each batched call to react and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react operation, and the batch result.

  • new_case_threshold (str, optional) –

    Distance to determine the privacy cutoff. If None, will default to “min”.

    Possible values:

    • min: minimum distance in the original local space.

    • max: maximum distance in the original local space.

    • most_similar: distance between the nearest neighbor to the nearest neighbor in the original space.

  • exclude_novel_nominals_from_uniqueness_check (bool, default False) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens when generate_new_cases is True. Only applies to generative reacts.

Returns:

A MutableMapping (dict-like) with these keys -> values:
action -> pandas.DataFrame

A data frame of action values.

details -> Dict or List

An aggregated list of any requested details.

Return type:

Reaction

Raises:
  • ValueError – If derived_action_features is not a subset of action_features. If new_case_threshold is not one of {“max”, “min”, “most_similar”}. If the number of context values does not match the number of context features.

  • HowsoError – If num_cases_to_generate is not an integer greater than 0.

react_group(trainee_id, new_cases, *, features=None, distance_contributions=False, familiarity_conviction_addition=True, familiarity_conviction_removal=False, kl_divergence_addition=False, kl_divergence_removal=False, p_value_of_addition=False, p_value_of_removal=False, weight_feature=None, use_case_weights=False)#

Computes specified data for a set of cases.

Return the list of familiarity convictions (and optionally, distance contributions or p values) for each set.

Parameters:
  • trainee_id (str) – The trainee id.

  • new_cases (list of list of list of object or list of DataFrame) –

    Specify a set using a list of cases to compute the conviction of groups of cases as shown in the following example.

    >>> [ [[1, 2, 3], [4, 5, 6], [7, 8, 9]], # Group 1
    >>>   [[1, 2, 3]] ] # Group 2
    

  • features (iterable of str, optional) – An iterable of feature names to consider while calculating convictions. cases from this other specified trainee instead.

  • distance_contributions (bool, default False) – Calculate and output distance contribution ratios in the output dict for each case.

  • familiarity_conviction_addition (bool, default True) – Calculate and output familiarity conviction of adding the specified cases.

  • familiarity_conviction_removal (bool, default False) – Calculate and output familiarity conviction of removing the specified cases.s

  • kl_divergence_addition (bool, default False) – Calculate and output KL divergence of adding the specified cases.

  • kl_divergence_removal (bool, default False) – Calculate and output KL divergence of removing the specified cases.

  • p_value_of_addition (bool, default False) – If true will output p value of addition.

  • p_value_of_removal (bool, default False) – If true will output p value of removal.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

Returns:

The react response.

Return type:

dict

react_into_features(trainee_id, *, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, features=None, influence_weight_entropy=False, p_value_of_addition=False, p_value_of_removal=False, similarity_conviction=False, use_case_weights=False, weight_feature=None)#

Calculate and cache conviction and other statistics.

Parameters:
  • trainee_id (str) – The ID of the Trainee to calculate and store conviction for.

  • features (iterable of str, optional) – An iterable of features to calculate convictions.

  • familiarity_conviction_addition (bool or str, default False) – The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.

  • familiarity_conviction_removal (bool or str, default False) – The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.

  • influence_weight_entropy (bool or str, default False) – The name of the feature to store influence weight entropy values in. If set to True, the values will be stored in the feature ‘influence_weight_entropy’.

  • p_value_of_addition (bool or str, default False) – The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.

  • p_value_of_removal (bool or str, default False) – The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.

  • similarity_conviction (bool or str, default False) – The name of the feature to store similarity conviction values. If set to True the values will be stored to the feature ‘similarity_conviction’.

  • distance_contribution (bool or str, default False) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

react_into_trainee(trainee_id, *, action_feature=None, context_features=None, contributions=None, contributions_robust=None, hyperparameter_param_path=None, mda=None, mda_permutation=None, mda_robust=None, mda_robust_permutation=None, num_robust_influence_samples=None, num_robust_residual_samples=None, num_robust_influence_samples_per_case=None, num_samples=None, residuals=None, residuals_robust=None, sample_model_fraction=None, sub_model_size=None, use_case_weights=False, weight_feature=None)#

Compute and cache specified feature prediction stats.

Parameters:
  • trainee_id (str) – The ID of the Trainee to react to.

  • action_feature (str, optional) – Name of target feature for which to do computations. Default is whatever the model was analyzed for, e.g., action feature for MDA and contributions, or “.targetless” if analyzed for targetless. This parameter is required for MDA or contributions computations.

  • context_features (iterable of str, optional) – List of features names to use as contexts for computations. Default is all trained non-unique features if unspecified.

  • contributions (bool, optional) – For each context_feature, use the full set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • contributions_robust (bool, optional) – For each context_feature, use the robust (power set/permutation) set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • hyperparameter_param_path (iterable of str, optional.) – Full path for hyperparameters to use for computation. If specified for any residual computations, takes precedence over action_feature parameter. Can be set to a ‘paramPath’ value from the results of ‘get_params()’ for a specific set of hyperparameters.

  • mda (bool, optional) – When True will compute Mean Decrease in Accuracy (MDA) for each context feature at predicting the action_feature. Drop each feature and use the full set of remaining context features for each prediction. False removes cached values.

  • mda_permutation (bool, optional) – Compute MDA by scrambling each feature and using the full set of remaining context features for each prediction. False removes cached values.

  • mda_robust (bool, optional) – Compute MDA by dropping each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • mda_robust_permutation (bool, optional) – Compute MDA by scrambling each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • num_robust_influence_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust contribution computation. Defaults to 300.

  • num_robust_residual_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust mda and residual computation. Defaults to 1000 * (1 + log(number of features)). Note: robust mda will be updated to use num_robust_influence_samples in a future release.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • num_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for all non-robust computation. Defaults to 1000. If specified overrides sample_model_fraction.```

  • residuals (bool, optional) – For each context_feature, use the full set of all other context_features to predict the feature. When True computes and caches MAE (mean absolute error), R^2, RMSE (root mean squared error), and Spearman Coefficient for continuous features, and MAE, accuracy, precision, recall, and Matthews correlation coefficient for nominal features. False removes cached values.

  • residuals_robust (bool, optional) – For each context_feature, computes and caches the same stats as residuals but using the robust (power set/permutations) set of all other context_features to predict the feature. False removes cached values.

  • sample_model_fraction (float, optional) – A value between 0.0 - 1.0, percent of model to use in sampling (using sampling without replacement). Applicable only to non-robust computation. Ignored if num_samples is specified. Higher values provide better accuracy at the cost of compute time.

  • sub_model_size (int, optional) – Subset of model to use for calculations. Applicable only to models > 1000 cases.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – The name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Return type:

None

react_series(trainee_id, *, action_features=None, actions=None, batch_size=None, case_indices=None, contexts=None, context_features=None, continue_series=False, continue_series_features=None, continue_series_values=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, exclude_novel_nominals_from_uniqueness_check=False, feature_bounds_map=None, final_time_steps=None, generate_new_cases='no', init_time_steps=None, initial_batch_size=None, initial_features=None, initial_values=None, input_is_substituted=False, leave_case_out=False, max_series_lengths=None, new_case_threshold='min', num_series_to_generate=1, ordered_by_specified_features=False, output_new_series_ids=True, preserve_feature_values=None, progress_callback=None, series_context_features=None, series_context_values=None, series_id_tracking='fixed', series_index=None, series_stop_maps=None, substitute_output=True, suppress_warning=False, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

React in a series until a series_stop_map condition is met.

Aggregates rows of data corresponding to the specified context, action, derived_context and derived_action features, utilizing previous rows to derive values as necessary. Outputs a dict of “action_features” and corresponding “action” where “action” is the completed ‘matrix’ for the corresponding action_features and derived_action_features.

Parameters:
  • trainee_id (str) – The ID of the Trainee to react to.

  • num_series_to_generate (int, default 1) – The number of series to generate.

  • final_time_steps (list of object, optional) – The time steps at which to end synthesis. Time-series only. Must provide either one for all series, or exactly one per series.

  • init_time_steps (list of object, optional) – The time steps at which to begin synthesis. Time-series only. Must provide either one for all series, or exactly one per series.

  • initial_features (iterable of str, optional) – List of features to condition just the first case in a series, overwrites context_features and derived_context_features for that first case. All specified initial features must be in one of: context_features, action_features, derived_context_features or derived_action_features. If provided a value that isn’t in one of those lists, it will be ignored.

  • initial_values (list of list of object, optional) – 2d list of values corresponding to the initial_features, used to condition just the first case in each series. Must provide either one for all series, or exactly one per series.

  • series_stop_maps (list of dict of dict, optional) –

    A dictionary of feature name to stop conditions. Must provide either one for all series, or exactly one per series.

    Tip

    Stop series when value exceeds max or is smaller than min:

    {"feature_name":  {"min" : 1, "max": 2}}
    

    Stop series when feature value matches any of the values listed:

    {"feature_name":  {"values": ["val1", "val2"]}}
    

  • max_series_lengths (list of int, optional) – maximum size a series is allowed to be. Default is 3 * model_size, a 0 or less is no limit. If forecasting with continue_series, this defines the maximum length of the forecast. Must provide either one for all series, or exactly one per series.

  • continue_series (bool, default False) –

    When True will attempt to continue existing series instead of starting new series. If initial_values provide series IDs, it will continue those explicitly specified IDs, otherwise it will randomly select series to continue. .. note:

    Terminated series with terminators cannot be continued and
    will result in null output.
    

  • continue_series_features (list of str, optional) – The list of feature names corresponding to the values in each row of continue_series_values. This value is ignored if continue_series_values is None.

  • continue_series_values (list of list of list of object or list of pandas.DataFrame, default None) – The set of series data to be forecasted with feature values in the same order defined by continue_series_values. The value of continue_series will be ignored and treated as true if this value is specified.

  • derived_context_features (iterable of str, optional) – List of context features whose values should be computed from the entire series in the specified order. Must be different than context_features.

  • derived_action_features (iterable of str, optional) –

    List of action features whose values should be computed from the resulting last row in series, in the specified order. Must be a subset of action_features.

    Note

    Both of these derived feature lists rely on the features’ “derived_feature_code” attribute to compute the values. If “derived_feature_code” attribute references non-existing feature indices, the derived value will be null.

  • exclude_novel_nominals_from_uniqueness_check (bool, default False) – If True, will exclude features which have a subtype defined in their feature attributes from the uniqueness check that happens when generate_new_cases is True. Only applies to generative reacts.

  • series_context_features (iterable of str, optional) – List of context features corresponding to series_context_values, if specified must not overlap with any initial_features or context_features.

  • series_context_values (list of list of list of object or list of DataFrame, optional) – 3d-list of context values, one for each feature for each row for each series. If specified, max_series_lengths are ignored.

  • output_new_series_ids (bool, default True) – If True, series ids are replaced with unique values on output. If False, will maintain or replace ids with existing trained values, but also allows output of series with duplicate existing ids.

  • series_id_tracking ({"dynamic", "fixed", "no"}, default "fixed") –

    Controls how closely generated series should follow existing series (plural).

    Choices are: “fixed” , “dynamic” or “no”:

    • If “fixed”, tracks the particular relevant series ID.

    • If “dynamic”, tracks the particular relevant series ID, but is allowed to change the series ID that it tracks based on its current context.

    • If “no”, does not track any particular series ID.

  • series_index (str, Optional) – When set to a string, will include the series index as a column in the returned DataFrame using the column name given. If set to None, no column will be added.

  • progress_callback (callable, optional) – A callback method that will be called before each batched call to react series and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react series operation, and the batch result.

  • batch_size (int, optional) – Define the number of series to react to at once. If left unspecified, the batch size will be determined automatically.

  • initial_batch_size (int, optional) – The number of series to react to in the first batch. If unspecified, the number will be determined automatically. The number of series in following batches will be automatically adjusted. This value is ignored if batch_size is specified.

  • contexts (list of list of object or DataFrame) – See parameter contexts in HowsoDirectClient.react().

  • action_features (iterable of str) – See parameter action_features in HowsoDirectClient.react().

  • actions (list of list of object or DataFrame) – See parameter actions in HowsoDirectClient.react().

  • context_features (iterable of str) – See parameter context_features in HowsoDirectClient.react().

  • input_is_substituted (bool, default False) – See parameter input_is_substituted in HowsoDirectClient.react().

  • substitute_output (bool) – See parameter substitute_output in HowsoDirectClient.react().

  • details (dict, optional) – See parameter details in HowsoDirectClient.react().

  • desired_conviction (float) – See parameter desired_conviction in HowsoDirectClient.react().

  • weight_feature (str) – See parameter weight_feature in HowsoDirectClient.react().

  • use_case_weights (bool) – See parameter use_case_weights in HowsoDirectClient.react().

  • case_indices (iterable of sequence of str, int) – See parameter case_indices in HowsoDirectClient.react().

  • preserve_feature_values (iterable of str) – See parameter preserve_feature_values in HowsoDirectClient.react().

  • new_case_threshold (str) – See parameter new_case_threshold in HowsoDirectClient.react().

  • leave_case_out (bool) – See parameter leave_case_out in HowsoDirectClient.react().

  • use_regional_model_residuals (bool) – See parameter use_regional_model_residuals in HowsoDirectClient.react().

  • feature_bounds_map (dict of dict) – See parameter feature_bounds_map in HowsoDirectClient.react().

  • generate_new_cases ({"always", "attempt", "no"}) – See parameter generate_new_cases in HowsoDirectClient.react().

  • ordered_by_specified_features (bool) – See parameter ordered_by_specified_features in HowsoDirectClient.react().

  • suppress_warning (bool) – See parameter suppress_warning in HowsoDirectClient.react().

Returns:

A MutableMapping (dict-like) with these keys -> values:
action -> pandas.DataFrame

A data frame of action values.

details -> Dict or List

An aggregated list of any requested details.

Return type:

Reaction

Raises:
  • ValueError – If the number of provided context values does not match the length of context features. If series_context_values is not a 3d list of objects. If series_continue_values is not a 3d list of objects. If derived_action_features is not a subset of action_features. If new_case_threshold is not one of {“max”, “min”, “most_similar”}.

  • HowsoError – If num_series_to_generate is not an integer greater than 0.

release_trainee_resources(trainee_id)#

Release a trainee’s resources from the Howso service.

Parameters:

trainee_id (str) – The ID of the Trainee to release resources for.

Raises:

HowsoError – If the requested Trainee has a persistence of “never”.

remove_cases(trainee_id, num_cases, *, case_indices=None, condition=None, condition_session=None, distribute_weight_feature=None, precision=None, preserve_session_data=False)#

Removes training cases from a Trainee.

The training cases will be completely purged from the model and the model will behave as if it had never been trained with them.

Parameters:
  • trainee_id (str) – The ID of the Trainee to remove cases from.

  • num_cases (int) – The number of cases to remove; minimum 1 case must be removed. Ignored if case_indices is specified.

  • case_indices (list of tuples) – A list of tuples containing session ID and session training index for each case to be removed.

  • condition (dict of str to object, optional) –

    The condition map to select the cases to remove that meet all the provided conditions. Ignored if case_indices is specified.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    Example 1 - Remove all values belonging to feature_name:

    criteria = {"feature_name": None}
    

    Example 2 - Remove cases that have the value 10:

    criteria = {"feature_name": 10}
    

    Example 3 - Remove cases that have a value in range [10, 20]:

    criteria = {"feature_name": [10, 20]}
    

    Example 4 - Remove cases that match one of [‘a’, ‘c’, ‘e’]:

    condition = {"feature_name": ['a', 'c', 'e']}
    

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id. Ignored if case_indices is specified.

  • distribute_weight_feature (str, optional) – When specified, will distribute the removed cases’ weights from this feature into their neighbors.

  • precision ({"exact", "similar"}, optional) – The precision to use when moving the cases, defaults to “exact”. Ignored if case_indices is specified.

  • preserve_session_data (bool, default False) – When True, will remove cases without cleaning up session data.

Returns:

The number of cases removed.

Return type:

int

Raises:

ValueError – If num_cases is not at least 1.

remove_feature(trainee_id, feature, *, condition=None, condition_session=None)#

Removes a feature from a trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee remove the feature from.

  • feature (str) – The name of the feature to remove.

  • condition (dict, optional) –

    A condition map where features will only be removed when certain criteria is met.

    If None, the feature will be removed from all cases in the model and feature metadata will be updated to exclude it. If specified as an empty dict, the feature will still be removed from all cases in the model but the feature metadata will not be updated.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    For instance to remove the length feature only when the value is between 1 and 5:

    condition = {"length": [1, 5]}
    

  • condition_session (str, optional) – If specified, ignores the condition and operates on cases for the specified session id.

remove_series_store(trainee_id, series=None)#

Clear any stored series from the Trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee to remove the series store from.

  • series (str, optional) –

    The ID of the series to clear.

    If None, the Trainee’s entire series store will be cleared.

rename_subtrainee(trainee_id, new_name, *, child_id=None, child_name_path=None)#

Renames a contained child trainee in the hierarchy.

Parameters:
  • trainee_id (str) – The ID of the Trainee whose child to rename.

  • new_name (str,) – New name of child trainee

  • child_id (str, optional) – Unique id of child trainee to rename. Ignored if child_name_path is specified

  • child_name_path (list of str, optional) – List of strings specifying the user-friendly path of the child subtrainee to rename.

Return type:

None

report_version(task)#

Report to end-user that there is a newer version available.

set_auto_ablation_params(trainee_id, auto_ablation_enabled=False, *, auto_ablation_weight_feature='.case_weight', conviction_lower_threshold=None, conviction_upper_threshold=None, exact_prediction_features=None, influence_weight_entropy_threshold=0.6, minimum_model_size=1000, relative_prediction_threshold_map=None, residual_prediction_features=None, tolerance_prediction_threshold_map=None, **kwargs)#

Set trainee parameters for auto ablation.

Note

Auto-ablation is experimental and the API may change without deprecation.

Parameters:
  • trainee_id (str) – The ID of the Trainee to set auto ablation parameters for.

  • auto_ablation_enabled (bool, default False) – When True, the train() method will ablate cases that meet the set criteria.

  • auto_ablation_weight_feature (str, default ".case_weight") – The weight feature that should be accumulated to when cases are ablated.

  • minimum_model_size (int, default 1,000) – The threshold of the minimum number of cases at which the model should auto-ablate.

  • influence_weight_entropy_threshold (float, default 0.6) – The influence weight entropy quantile that a case must be beneath in order to be trained.

  • exact_prediction_features (Optional[List[str]], optional) – For each of the features specified, will ablate a case if the prediction matches exactly.

  • residual_prediction_features (Optional[List[str]], optional) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= feature residual.

  • tolerance_prediction_threshold_map (Optional[Dict[str, Tuple[float, float]]], optional) – For each of the features specified, will ablate a case if the prediction >= (case value - MIN) and the prediction <= (case value + MAX).

  • relative_prediction_threshold_map (Optional[Dict[str, float]], optional) – For each of the features specified, will ablate a case if abs(prediction - case value) / prediction <= relative threshold

  • conviction_lower_threshold (Optional[float], optional) – The conviction value above which cases will be ablated.

  • conviction_upper_threshold (Optional[float], optional) – The conviction value below which cases will be ablated.

set_auto_analyze_params(trainee_id, auto_analyze_enabled=False, analyze_threshold=None, *, auto_analyze_limit_size=None, analyze_growth_factor=None, **kwargs)#

Set trainee parameters for auto analysis.

Parameters:
  • trainee_id (str) – The ID of the Trainee to set auto analysis parameters for.

  • auto_analyze_enabled (bool, default False) – When True, the train() method will trigger an analyze when it’s time for the model to be analyzed again.

  • analyze_threshold (int, optional) – The threshold for the number of cases at which the model should be re-analyzed.

  • auto_analyze_limit_size (int, optional) – The size of of the model at which to stop doing auto-analysis. Value of 0 means no limit.

  • analyze_growth_factor (float, optional) – The factor by which to increase the analyze threshold every time the model grows to the current threshold size.

  • kwargs (dict, optional) – Parameters specific for analyze() may be passed in via kwargs, and will be cached and used during future auto-analysis.

set_feature_attributes(trainee_id, feature_attributes)#

Sets feature attributes for a Trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • feature_attributes (dict of str to dict) –

    A dict of dicts of feature attributes. Each key is the feature ‘name’ and each value is a dict of feature-specific parameters.

    Example:

    {
        "length": { "type" : "continuous", "decimal_places": 1 },
        "width": { "type" : "continuous", "significant_digits": 4 },
        "degrees": { "type" : "continuous", "cycle_length": 360 },
        "class": { "type" : "nominal" }
    }
    

set_label(entity_id, label, label_value)#

Set a label value in the trainee.

Parameters:
  • entity_id (str) – The ID of the Trainee containing/to contain the label.

  • label (str) – The name of the label.

  • label_value (object) – The value to set to the label.

set_params(trainee_id, params)#

Sets specific hyperparameters in the trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee set hyperparameters.

  • params (dict) –

    A dictionary in the following format containing the hyperparameter information, which is required, and other parameters which are all optional.

    Example:

    {
        "hyperparameter_map": {
            ".targetless": {
                "robust": {
                    ".none": {
                        "dt": -1, "p": .1, "k": 8
                    }
                }
            }
        },
    }
    

set_random_seed(trainee_id, seed)#

Sets the random seed for the trainee.

Parameters:
  • trainee_id (str) – The ID of the Trainee to set the random seed for.

  • seed (int or float or str) – The random seed. Ex: 7998, "bobtherandomseed"

set_substitute_feature_values(trainee_id, substitution_value_map)#

Set a Trainee’s substitution map for use in extended nominal generation.

Parameters:
  • trainee_id (str) – The ID of the Trainee to set substitute feature values for.

  • substitution_value_map (dict) – A dictionary of feature name to a dictionary of feature value to substitute feature value.

train(trainee_id, cases, features=None, *, accumulate_weight_feature=None, batch_size=None, derived_features=None, initial_batch_size=None, input_is_substituted=False, progress_callback=None, series=None, skip_auto_analyze=False, train_weights_only=False, validate=True)#

Train one or more cases into a trainee (model).

Parameters:
  • trainee_id (str) – The ID of the target Trainee.

  • cases (list of list of object or pandas.DataFrame) – One or more cases to train into the model.

  • features (iterable of str, optional) –

    An iterable of feature names. This parameter should be provided in the following scenarios:

    1. When cases are not in the format of a DataFrame, or the DataFrame does not define named columns.

    2. You want to train only a subset of columns defined in your cases DataFrame.

    3. You want to re-order the columns that are trained.

  • accumulate_weight_feature (str, optional) – Name of feature into which to accumulate neighbors’ influences as weight for ablated cases. If unspecified, will not accumulate weights.

  • batch_size (int, optional) – Define the number of cases to train at once. If left unspecified, the batch size will be determined automatically.

  • derived_features (iterable of str, optional) – List of feature names for which values should be derived in the specified order. If this list is not provided, features with the ‘auto_derive_on_train’ feature attribute set to True will be auto-derived. If provided an empty list, no features are derived. Any derived_features that are already in the ‘features’ list will not be derived since their values are being explicitly provided.

  • initial_batch_size (int, optional) – Define the number of cases to train in the first batch. If unspecified, the value of the train_initial_batch_size property is used. The number of cases in following batches will be automatically adjusted. This value is ignored if batch_size is specified.

  • input_is_substituted (bool, default False) – if True assumes provided nominal feature values have already been substituted.

  • progress_callback (callable, optional) – A callback method that will be called before each batched call to train and at the end of training. The method is given a ProgressTimer containing metrics on the progress and timing of the train operation.

  • series (str, optional) – Name of the series to pull features and case values from internal series storage. If specified, trains on all cases that are stored in the internal series store for the specified series. The trained feature set is the combined features from storage and the passed in features. If cases is of length one, the value(s) of this case are appended to all cases in the series. If cases is the same length as the series, the value of each case in cases is applied in order to each of the cases in the series.

  • skip_auto_analyze (bool, default False) – When true, the Trainee will not auto-analyze when appropriate. Instead, the boolean response will be True if an analyze is needed.

  • train_weights_only (bool, default False) – When true, and accumulate_weight_feature is provided, will accumulate all of the cases’ neighbor weights instead of training the cases into the model.

  • validate (bool, default True) – Whether to validate the data against the provided feature attributes. Issues warnings if there are any discrepancies between the data and the features dictionary.

Returns:

Flag indicating if the Trainee needs to analyze. Only true if auto-analyze is enabled and the conditions are met.

Return type:

bool

unload_trainee(trainee_id)#

Unload a Trainee from the Howso service.

Deprecated since version 1.0.0: Use HowsoDirectClient.release_trainee_resources() instead.

Parameters:

trainee_id (str) – The ID of the Trainee unload.

update_session(session_id, *, metadata=None)#

Update a session.

Note

Updates the session across all loaded trainees.

Parameters:
  • session_id (str) – The id of the session to update metadata for.

  • metadata (dict, optional) – Any key-value pair to store as custom metadata for the session.

Returns:

The updated session instance.

Return type:

howso.openapi.models.Session

Raises:
  • TypeError – If metadata is non-None and not a dictionary.

  • HowsoError – If session_id is not found for the active session or any of the session(s) of a loaded Trainees.

update_trainee(trainee)#

Update an existing Trainee in the Howso service.

Parameters:

trainee (Trainee) – A Trainee object defining the Trainee.

Returns:

The Trainee object that was updated.

Return type:

Trainee

upgrade_trainee(trainee_id, path_to_trainee=None, separate_files=False)#

Upgrade a saved Trainee to current version.

Parameters:
  • trainee_id (str) – The ID of the Trainee.

  • path_to_trainee (Path or str, optional) – The path to where the saved Trainee file is located.

  • separate_files (bool, default False) – Whether to load each case from its individual file.

BAD_TRAINEE_NAME_CHARS = {'..', '/', ':', '\\'}#

The characters which are disallowed from being a part of a Trainee name or ID.

SUPPORTED_PRECISION_VALUES = ['exact', 'similar']#

The supported values of precision for methods that accept it

property active_session: Session#

Return the active session.

Returns:

The active session instance.

Return type:

howso.openapi.models.Session

property react_initial_batch_size: int#

The default number of cases in the first react batch.

Returns:

The default number of cases to react to in the first batch.

Return type:

int

property train_initial_batch_size: int#

The default number of cases in the first train batch.

Returns:

The default number of cases to train in the first batch.

Return type:

int

property trainee_cache: TraineeCache#

Return the trainee cache.

Returns:

The trainee cache.

Return type:

howso.client.cache.TraineeCache