Utilities Package#

from howso.utilities import ...

This module contains various utilities for the Howso clients.

class howso.utilities.FeatureType#

Bases: Enum

Feature type enum.

class howso.utilities.LocaleOverride#

Bases: object

Implements a thread-safe context manager for switching locales temporarily.

Background#

Python’s locale.setlocale() is not thread safe. In order to work with alternate locales temporarily, this ContextDecorator will use a thread lock on __enter__ and release said lock on __exit__.

Important Notes#

All other threads will be blocked within the scope of the context. It is important to avoid time-consuming execution inside.

Example Usage#

>>> # Parse date string from French and format it in English.
>>>
>>> # System locale is 'en-us' (in this example)
>>> from datetime import datetime
>>> dt_format = '<some format>'
>>> dt_obj = datetime()
>>> with locale_override('fr-fr', category=locale.LC_DATE):
>>>     # We're in French date-formatting zone here...
>>>     date_obj = datetime.strptime(dt_value, dt_format)
>>>
>>> # Back in the 'en-us' locale again.
>>> dt_value = dt_obj.strftime(dt_format)
param language_code:
A language code /usually/ given as either:
  • 2 lower case letters for the base language Ex: fr for French.

  • 5 characters such as fr_CA where the first 2 designate the base language (French in this example) followed by an _ followed by 2 upper case characters designating the country- specific dialect (Canada, in this example). This example designates the French-Canadian locale.

  • Any of the above, plus an optional encoding following a ‘.’ Ex: fr_FR.UTF-8

type language_code:

str

param encoding:

An encoding such as ‘UTF-8’ or ‘ISO8859-1’, etc. If not provided and there is no embedded encoding within the language_code parameter, ‘UTF-8’ is used. If an encoding is embedded in the language_code parameter and an explicit encoding provided here, the embedded encoding is dropped and ignored.

type encoding:

str

param category:

This is one of the constants set within the locale object. See: https://docs.python.org/3.9/library/locale.html for details. locale.LC_ALL is used if nothing provided.

type category:

int

__init__(language_code, encoding=None, category=6)#

Construct the context manager.

restore()#

Restore the original locale and release the thread lock.

Use this method directly to restore the current context when not using this class as a context manager.

setup()#

Set a thread lock and the locale as desired.

Use this method directly to setup a locale context when not using this class as a context manager.

class howso.utilities.ProgressTimer#

Bases: Timer

Monitor progress of a task.

Parameters:
  • total_ticks (int, default 100) – The total number of ticks in the progress meter.

  • start_tick (int, default 0) – The starting tick.

__init__(total_ticks=100, *, start_tick=0)#

Initialize a new ProgressTimer instance.

Parameters:
  • total_ticks (int) –

  • start_tick (int) –

property is_complete: bool#

If progress has reached completion.

property progress: float#

The current progress percentage.

reset()#

Reset the progress timer.

Return type:

None

start()#

Start the progress timer.

Return type:

ProgressTimer

property tick_duration: timedelta | None#

The duration since the last tick.

Returns:

The duration since the last tick, or None if not yet started.

Return type:

timedelta or None

property time_remaining: timedelta#

The estimated time remaining.

Returns:

The time estimated to be remaining.

Return type:

timedelta

Raises:

ValueError – If timer not yet started.

update(ticks=1)#

Update the progress by given ticks.

Parameters:

ticks (int, default 1) – The number of ticks to increment/decrement by.

Return type:

None

exception howso.utilities.StopExecution#

Bases: Exception

Raise a StopExecution as this is a cleaner exit() for Notebooks.

class howso.utilities.Timer#

Bases: object

Simple context manager to capture run duration of the inner context.

Usage:

with Timer() as my_timer:
    # perform time-consuming task here...
print(f"The task took {my_timer.duration}."

Results in:

"The task took 1:30:10.454419"
__init__()#

Initialize a new Timer instance.

property duration: timedelta | None#

The total computed duration of the timer.

Returns:

The total duration of the timer. When the timer has not yet ended, the duration between now and when the timer started will be returned. If the timer has not yet started, returns None.

Return type:

timedelta or None

end()#

End the timer.

Return type:

None

property has_ended: bool#

If the timer has ended.

property has_started: bool#

If the timer has started.

reset()#

Reset the timer.

Return type:

None

property seconds: float | None#

The total seconds representing the duration of timer instance.

start()#

Start the timer.

Return type:

Timer

class howso.utilities.UserFriendlyExit#

Bases: object

Return a callable that, when called, simply prints msg and cleanly exits.

Parameters:

verbose (bool) – If True, emit more information

__init__(verbose=False)#

Construct a UserFriendlyExit instance.

howso.utilities.align_data(x, y=None)#

Check and fix type problems with the data and reshape it.

x is a Matrix and y is a vector.

Parameters:
  • x (numpy.ndarray) – Feature values ndarray.

  • y (numpy.ndarray, default None) – Target values ndarray.

Return type:

numpy.ndarray, numpy.ndarray or numpy.ndarray

howso.utilities.build_react_series_df(react_series_response, series_index=None)#

Build a DataFrame from the response from react_series.

If series_index is set, use that as a name for an additional feature that will be the series index.

Parameters:
  • react_series_response (Dictionary) – The response dictionary from a call to react_series.

  • series_index (String) – The name of the series index feature, which will index each series in the form ‘series_<idx>’, e.g., series_1, series_1, …, series_n. If None, does not include the series index feature in the returned DataFrame.

Returns:

A Pandas DataFrame defined by the action features and series data in the react_series response. Optionally includes a series index feature.

Return type:

pd.DataFrame

howso.utilities.check_feature_names(features, expected_feature_names, raise_error=False)#

Check if features in features dict matches expected_feature_names.

Parameters:
  • features (Mapping) – A feature dictionary that maps feature names to its attributes.

  • expected_feature_names (Collection) – A list (or a set) of expected column names in the given features dictionary.

  • raise_error (bool, defaults to False) – Raise a value error in case the feature names doesn’t match between features and expected_feature_names.

Returns:

Returns True if the feature names in features matches the expected feature names passed via expected_feature_names. Otherwise, returns False.

Return type:

bool

Raises:
  • If raise_error is True, raises ValueError to indicate that

  • the feature names in features dict doesn't match the feature names

  • expected_feature_names

howso.utilities.date_format_is_iso(f)#

Check if datetime format is ISO8601.

Does format match the iso8601 set that can be handled by the C parser? Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different but must be consistent. Leading 0s in dates and times are optional.

Sourced from Pandas: pandas-dev/pandas

howso.utilities.date_to_epoch(date_obj, time_format)#

Convert date into epoch (i.e seconds counted from Jan 1st 1970).

Note

If date_str is None or nan, it will be returned as is.

Parameters:
  • date_obj (str or datetime.date or datetime.time or datetime.datetime) – Time object.

  • time_format (str) – Specify format of the time. Ex: %a %b %d %H:%M:%S %Y

Returns:

The epoch date as a floating point value or ‘np.nan’, et al.

Return type:

Union[str, float]

howso.utilities.deserialize_cases(data, columns, features=None)#

Deserialize case data into a DataFrame.

If feature attributes contain original typing information, columns will be converted to the same data type as original training cases.

Parameters:
  • data (list of list or list of dict) – The context data.

  • columns (list of str) –

    The case column mapping.

    The order corresponds to how the data will be mapped to columns in the output. Ignored for list of dict where the dict key is the column name.

  • features (dict, default None) –

    (Optional) The dictionary of feature name to feature attributes.

    If not specified, no column typing will be attempted.

Returns:

The deserialized data.

Return type:

pandas.DataFrame

howso.utilities.determine_iso_format(str_date, fname)#

Determine which specific ISO8601 format the passed in date is in.

Specifically if it’s just a date, if it’s zoned, and if zoned, whether it’s a zone or an offset.

Parameters:
  • str_date (str) – The Date time passed in as a string.

  • fname (str) – Name of feature to guess bounds for.

Returns:

The ISO_8601 format string that most matches the passed in date.

Return type:

str

howso.utilities.dprint(debug, *argc, **kwargs)#

Print based on debug levels.

Parameters:
  • debug (bool or int) – If true, user_debug level would be 1. Possible levels: 1, 2, 3 (print all)

  • kwargs

    default_priorityint, default 1

    The message is printed only if the debug >= default_priority.

Examples

>>> dprint(True, "hello", "howso", priority=1)
`hello howso`
howso.utilities.epoch_to_date(epoch, time_format, tzinfo=None)#

Convert epoch to date if epoch is not None or nan else, return as it is.

Parameters:
  • epoch (Union[str, float]) – The epoch date as a floating point value (or str if np.nan, et al)

  • time_format (str) – Specify format of the time. Ex: %a %b %d %H:%M:%S %Y

  • tzinfo (datetime.tzinfo, optional) – Time zone information to include in datetime.

Returns:

A date string in the format similar to “Wed May 21 00:00:00 2008”

Return type:

str

howso.utilities.format_dataframe(df, features)#

Format DataFrame columns to original type using feature attributes.

Note

Modifies DataFrame in place.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to format columns of.

  • features (Dict) – The dictionary of feature name to feature attributes.

Returns:

The formatted data.

Return type:

pandas.DataFrame

howso.utilities.get_kwargs(kwargs, descriptors, warn_on_extra=False)#

Decompose kwargs into a tuple of return values.

Each tuple corresponds to a descriptor in ‘descriptors’. Optionally issue a warning on any items in kwargs that are not “consumed” by the descriptors.

Parameters:
  • kwargs (dict) – Mapping of keys and values (kwargs)

  • descriptors

    An iterable of descriptors for how to handle each item in kwargs. Each descriptor can be a mapping, another iterable, or a single string.

    If a mapping, it must at least include the key: ‘key’ but can also optionally include the keys: ‘default’ and ‘test’.

    If a non-mapping iterable, the values will be interpreted as ‘key’ ‘default’, ‘test, in that order. Only the first is absolutely required the remaining will be evaluated to None if not provided.

    If a string provided, it is used as the ‘key’. ‘default’ and ‘test are set to None.

    If a ‘key’ is not found in the kwargs, then the ‘default’ value is returned.

    If a descriptor contains a ‘test’, it should be a callable that returns a boolean. If False, the ‘default’ value is returned.

    If the ‘default’ provided is an instance of an Exception, then, the exception is raised when the ‘key’ is not present, or the ‘test’ fails.

  • warn_on_extra (bool) – If True, will issue warnings about any keys provided in kwargs that were not consumed by the descriptors. Default is False

Returns:

  • A tuple of the found values in the same order as the

  • provided descriptor.

Raises:
  • May raise any exception given as a 'default' in the

  • descriptors

Usage#

An example of usage showing various ways to use descriptors:

>>> def my_method(self, required, **kwargs):
>>>     apple, banana, cherry, durian, elderberry = get_kwargs(kwargs, (
>>>         # A simple string is interpreted as the 'key' with 'default of
>>>         # `None` and no test. Very common use-case made simple.
>>>         'apple',
>>>
>>>         # Another common use-case. Set value to 5 if not in kwargs.
>>>         # This also shows using an tuple for the descriptor.
>>>         ('banana', 5),
>>>
>>>         # Verbose input including a test using dict
>>>         {'key': 'cherry', 'default': 5, 'test': lambda x: x > 0},
>>>
>>>         # The test, `is_durian`, is defined elsewhere
>>>         ('durian', None, is_durian),
>>>
>>>         # Full example using iterable descriptor rather than mapping.
>>>         ('elderberry', ValueError('"elderberry" must be > 5.'),
>>>             lambda x: x > 5),
>>>     ))
howso.utilities.infer_feature_attributes(data, *, tables=None, time_feature_name=None, **kwargs)#

Return a dict-like feature attributes object with useful accessor methods.

The returned object is a subclass of FeatureAttributesBase that is appropriate for the provided data type.

Parameters:
  • data (Any) – The data source to infer feature attributes from. Must be a supported data type.

  • tables (Iterable of TableNameProtocol) –

    (Optional, required for datastores) An Iterable of table names to infer feature attributes for.

    If included, feature attributes will be generated in the form {table_name: {feature_attribute: value}}.

  • time_feature_name (str, default None) – (Optional, required for time series) The name of the time feature.

  • features (dict or None, default None) –

    (Optional) A partially filled features dict. If partially filled attributes for a feature are passed in, those parameters will be retained as is and the rest of the attributes will be inferred.

    For example:
    >>> from pprint import pprint
    >>> df.head(2)
    ... sepal-length  sepal-width  petal-length  petal-width  target
    ... 0           6.7          3.0           5.2          2.3       2
    ... 1           6.0          2.2           5.0          1.5       2
    >>> # Partially filled features dict
    >>> partial_features = {
    ...     "sepal-length": {
    ...         "type": "continuous",
    ...         'bounds': {
    ...             'min': 2.72,
    ...             'max': 3,
    ...             'allow_null': True
    ...         },
    ...     },
    ...     "sepal-width": {
    ...         "type": "continuous"
    ...     }
    ... }
    >>> # Infer rest of the attributes
    >>> features = infer_feature_attributes(
    ...     df, features=partial_features
    ... )
    >>> # Inferred Feature dictionary
    >>> pprint(features)
    ... {
    ...     'sepal-length', {
    ...         'bounds': {
    ...             'allow_null': True, 'max': 3, 'min': 2.72
    ...         },
    ...         'type': 'continuous'
    ...     },
    ...     'sepal-width', {
    ...         'bounds': {
    ...             'allow_null': True, 'max': 7.38905609893065,
    ...             'min': 1.0
    ...         },
    ...         'type': 'continuous'
    ...     },
    ...     'petal-length', {
    ...         'bounds': {
    ...             'allow_null': True, 'max': 7.38905609893065,
    ...             'min': 1.0
    ...         },
    ...         'type': 'continuous'
    ...     },
    ...     'petal-width', {
    ...         'bounds': {
    ...             'allow_null': True, 'max': 2.718281828459045,
    ...             'min': 0.049787068367863944
    ...         },
    ...         'type': 'continuous'
    ...     },
    ...     'target', {
    ...         'bounds': {'allow_null': True},
    ...         'type': 'nominal'
    ...     }
    ... }
    

    Note that valid ‘data_type’ values for both nominal and continuous types are: ‘string’, ‘number’, ‘json’, ‘amalgam’, and ‘yaml’. The ‘boolean’ data_type is valid only when type is nominal. ‘string_mixable’ is valid only when type is continuous (predicted values may result in interpolated strings containing a combination of characters from multiple original values).

  • infer_bounds (bool, default True) – (Optional) If True, bounds will be inferred for the features if the feature column has at least one non NaN value

  • datetime_feature_formats (dict, default None) –

    (Optional) Dict defining a custom (non-ISO8601) datetime format and an optional locale for features with datetimes. By default datetime features are assumed to be in ISO8601 format. Non-English datetimes must have locales specified. If locale is omitted, the default system locale is used. The keys are the feature name, and the values are a tuple of date time format and locale string.

    Example:

    {
        "start_date": ("%Y-%m-%d %A %H.%M.%S", "es_ES"),
        "end_date": "%Y-%m-%d"
    }
    

  • delta_boundaries (dict, default None) –

    (Optional) For time series, specify the delta boundaries in the form {“feature” : {“min|max” : {order : value}}}. Works with partial values by specifying only particular order of derivatives you would like to overwrite. Invalid orders will be ignored.

    Examples:

    {
        "stock_value": {
            "min": {
                '0' : 0.178,
                '1': 3.4582e-3,
                '2': None
            }
        }
    }
    

  • derived_orders (dict, default None) – (Optional) Dict of features to the number of orders of derivatives that should be derived instead of synthesized. For example, for a feature with a 3rd order of derivative, setting its derived_orders to 2 will synthesize the 3rd order derivative value, and then use that synthed value to derive the 2nd and 1st order.

  • lags (list or dict, default None) –

    (Optional) A list containing the specific indices of the desired lag features to derive for each feature (not including the series time feature). Specifying derived lag features for the feature specified by time_feature_name must be done using a dictionary. A dictionary can be used to specify a list of specific lag indices for specific features. For example: {“feature1”: [1, 3, 5]} would derive three different lag features for feature1. The resulting lag features hold values 1, 3, and 5 timesteps behind the current timestep respectively.

    Note

    Using the lags parameter will override the num_lags parameter per feature

    Note

    A lag feature is a feature that provides a “lagging value” to a case by holding the value of a feature from a previous timestep. These lag features allow for cases to hold more temporal information.

  • num_lags (int or dict, default None) –

    (Optional) An integer specifying the number of lag features to derive for each feature (not including the series time feature). Specifying derived lag features for the feature specified by time_feature_name must be done using a dictionary. A dictionary can be used to specify numbers of lags for specific features. Features that are not specified will default to 1 lag feature.

    Note

    The num_lags parameter will be overridden by the lags parameter per feature.

  • orders_of_derivatives (dict, default None) – (Optional) Dict of features and their corresponding order of derivatives for the specified type (delta/rate). If provided will generate the specified number of derivatives and boundary values. If set to 0, will not generate any delta/rate features. By default all continuous features have an order value of 1.

  • rate_boundaries (dict, default None) –

    (Optional) For time series, specify the rate boundaries in the form {“feature” : {“min|max” : {order : value}}}. Works with partial values by specifying only particular order of derivatives you would like to overwrite. Invalid orders will be ignored.

    Examples:

    {
        "stock_value": {
            "min": {
                '0' : 0.178,
                '1': 3.4582e-3,
                '2': None
            }
        }
    }
    

  • tight_bounds (Iterable of str, default None) – (Optional) Set tight min and max bounds for the features specified in the Iterable.

  • tight_time_bounds (bool, default False) – (optional) If True, will set tight bounds on time_feature. This will cause the bounds for the start and end times set to the same bounds as observed in the original data.

  • time_feature_is_universal (bool, optional) – If True, the time feature will be treated as universal and future data is excluded while making predictions. If False, the time feature will not be treated as universal and only future data within the same series is excluded while making predictions. It is recommended to set this value to True if there is any possibility of global relevancy of time, which is the default behavior.

  • time_series_type_default (str, default 'rate') – (Optional) Type specifying how time series is generated. One of ‘rate’ or ‘delta’, default is ‘rate’. If ‘rate’, it uses the difference of the current value from its previous value divided by the change in time since the previous value. When ‘delta’ is specified, just uses the difference of the current value from its previous value regardless of the elapsed time.

  • time_series_types_override (dict, default None) – (Optional) Dict of features and their corresponding time series type, one of ‘rate’ or ‘delta’, used to override time_series_type_default for the specified features.

  • mode_bound_features (list of str, default None) – (Optional) Explicit list of feature names to use mode bounds for when inferring loose bounds. If None, assumes all features. A mode bound is used instead of a loose bound when the mode for the feature is the same as an original bound, as it may represent an application-specific min/max.

  • id_feature_name (str or list of str, default None) – (Optional) The name(s) of the ID feature(s).

  • time_invariant_features (list of str, default None) – (Optional) Names of time-invariant features.

  • attempt_infer_extended_nominals (bool, default False) –

    (Optional) If set to True, detections of extended nominals will be attempted. If the detection fails, the categorical variables will be set to int-id subtype.

    Note

    Please refer to kwargs for other parameters related to extended nominals.

  • nominal_substitution_config (dict of dicts, default None) – (Optional) Configuration of the nominal substitution engine and the nominal generators and detectors.

  • include_extended_nominal_probabilities (bool, default False) – (Optional) If true, extended nominal probabilities will be appended as metadata into the feature object.

  • datetime_feature_formats

    (optional) Dict defining a custom (non-ISO8601) datetime format and an optional locale for columns with datetimes. By default datetime columns are assumed to be in ISO8601 format. Non-English datetimes must have locales specified. If locale is omitted, the default system locale is used. The keys are the column name, and the values are a tuple of date time format and locale string:

    Example:

    {
        "start_date" : ("%Y-%m-%d %A %H.%M.%S", "es_ES"),
        "end_date" : "%Y-%m-%d"
    }
    

  • ordinal_feature_values (dict, default None) –

    (optional) Dict for ordinal string features defining an ordered list of string values for each feature, ordered low to high. If specified will set ‘type’ to be ‘ordinal’ for all features in this map.

    Example:

    {
        "grade" : [ "F", "D", "C", "B", "A" ],
        "size" : [ "small", "medium", "large", "huge" ]
    }
    

  • dependent_features (dict, default None) –

    (Optional) Dict of features with their respective lists of features that either the feature depends on or are dependent on them. Should be used when there are multi-type value features that tightly depend on values based on other multi-type value features.

    Examples:

    If there’s a feature name ‘measurement’ that contains measurements such as BMI, heart rate and weight, while the feature ‘measurement_amount’ contains the numerical values corresponding to the measurement, dependent features could be passed in as follows:

    {
        "measurement": [ "measurement_amount" ]
    }
    

    Since dependence directionality is not important, this will also work:

    {
        "measurement_amount": [ "measurement" ]
    }
    

Returns:

A subclass of FeatureAttributesBase (Single/MultiTableFeatureAttributes) that extends dict, thus providing dict-like access to feature attributes and useful accessor methods.

Return type:

FeatureAttributesBase

Examples

# 'data' is a DataFrame
>> attrs = infer_feature_attributes(data)
# Can access feature attributes like a dict
>> attrs
    {
        "feature_one": {
            "type": "continuous",
            "bounds": {"allow_null": True},
        },
        "feature_two": {
            "type": "nominal",
        }
    }
>> attrs["feature_one"]
    {
        "type": "continuous",
        "bounds": {"allow_null": True}
    }
# Or can call methods to do other stuff
>> attrs.get_parameters()
    {'type': "continuous"}

# Now 'data' is an object that implements SQLRelationalDatastoreProtocol
>> attrs = infer_feature_attributes(data, tables)
>> attrs
    {
        "table_1": {
            "feature_one": {
                "type": "continuous",
                "bounds": {"allow_null": True},
            },
            "feature_two": {
                "type": "nominal",
            }
        },
        "table_2" : {...},
    }
>> attrs.to_json()
    '{"table_1" : {...}}'
howso.utilities.is_valid_uuid(value, version=4)#

Check if a given string is a valid uuid.

Parameters:
  • value (str or UUID) – The value to test

  • version (int, optional) – The uuid version (Default: 4)

Returns:

True if value is a valid uuid string

Return type:

bool

howso.utilities.num_list_dimensions(lst)#

Return number of dimensions for a list.

Assumption is that the input nested lists are also lists, or a list of dataframes.

Parameters:

lst (list) – The nested list of objects.

Returns:

The number of dimensions in the passed in list.

Return type:

int

howso.utilities.replace_doublemax_with_infinity(dat)#

Replace values of Double.MAX_VALUE (1.79769313486232E+308) with Infinity.

For use when retrieving data from Howso.

Parameters:

dat (A dict, list, number, or string) –

Return type:

A dict, list, number, or string - same as passed in for translation

howso.utilities.replace_nan_with_none(dat)#

Replace None values with NaN values.

For use when feeding data to Howso from the scikit module to account for the different ways howso and sklearn represent missing values.

Parameters:

dat (list of list of object) – A 2d list of values.

Return type:

list[list[object]]

howso.utilities.replace_none_with_nan(dat)#

Replace None values with NaN values.

For use when retrieving data from Howso via the scikit module to conform to sklearn convention on missing values.

Parameters:

dat (list of dict of key-values) –

Return type:

list[dict]

howso.utilities.reshape_data(x, y)#

Reshapes X as a matrix and y as a vector.

Parameters:
  • x (np.ndarray) – Feature values ndarray.

  • y (np.ndarray) – target values ndarray.

Returns:

X, y

Return type:

np.ndarray, np.ndarray

howso.utilities.seconds_to_time(seconds, *, tzinfo=None)#

Convert seconds to a time object.

Parameters:
  • seconds (int or float) – The seconds to convert to time.

  • tzinfo (datetime.tzinfo, optional) – Time zone to use for resulting time object.

Returns:

The time object.

Return type:

datetime.time

howso.utilities.serialize_cases(data, columns, features, *, warn=False)#

Serialize case data into list of lists.

Parameters:
  • data (pandas.DataFrame or numpy.ndarray or list of list) – The data to serialize.

  • columns (list of str) – The case column mapping. The order corresponds to the order of cases in output.

  • features (dict) – The dictionary of feature name to feature attributes.

  • warn (bool, default False) – If warnings should be raised by serializer.

Returns:

The serialized data from DataFrame.

Return type:

list of list or None

howso.utilities.serialize_datetimes(cases, columns, features, *, warn=False)#

Serialize datetimes in the given list of cases, in-place.

Iterate over the passed in case values and serializes any datetime values according to the specified datetime format in feature attributes.

Parameters:
  • cases (list of list) – A 2d list of case values corresponding to the features of the cases.

  • columns (list of str) – A list of feature names.

  • features (dict) – Dictionary of feature attributes.

  • warn (bool, default: False) – If set to true, will warn user when specified datetime format doesn’t match the datetime strings.

Return type:

None

howso.utilities.time_to_seconds(time)#

Convert a time object to seconds since midnight.

Parameters:

time (datetime.time) – The time to convert.

Returns:

Seconds since midnight.

Return type:

float

howso.utilities.trainee_from_df(df, features=None, action_features=None, name=None, persistence='allow', trainee_metadata=None)#

Create a Trainee from a dataframe.

Assumes floats are continuous and all other values are nominal.

Parameters:
  • df (pandas.DataFrame) – A pandas Dataframe with column names corresponding to feature names.Features that are considered to be continuous should have a dtype of float.

  • features (Optional[Mapping[str, Mapping]]) – (Optional) A dictionary of feature names to a dictionary of parameters.

  • action_features (List of String, Default None) – (Optional) List of action features. Anything that’s not in this list will be treated as a context feature. For example, if no action feature is specified the trainee won’t have a target.

  • name (str or None, defaults to None) – (Optional) The name of the trainee.

  • persistence (str: default "allow") – The persistence setting to use for the trainee. Valid values: “always”, “allow”, “never”.

  • trainee_metadata (Mapping, optional) – (Optional) mapping of key/value pairs of metadata for trainee.

Returns:

A trainee object

Return type:

howso.openapi.models.Trainee

howso.utilities.validate_case_indices(case_indices, thorough=False)#

Validate the case_indices parameter to the react() method of a Howso client.

Raises a ValueError if case_indices has sequences that do not contain the expected data types of (str, int).

Parameters:
  • case_indices (Iterable of Sequence[str, int]) – The case_indices argument to validate.

  • thorough (bool, default False) – Whether to verify the data types in all sequences or only some (for performance)

Return type:

None

howso.utilities.validate_datetime_iso8061(datetime_value, feature)#

Check that the passed in datetime value adheres to the ISO 8601 format.

Warn the user if it doesn’t check out.

Parameters:
  • datetime_value (str) – The date value as a string

  • feature (str) – Name of feature

howso.utilities.validate_features(features, extended_feature_types=None)#

Validate the feature types in features.

Parameters:
  • features (dict) –

    The dict of feature name to feature attributes.

    The valid feature names are:

    1. ”nominal”

    2. ”continuous”

    3. ”ordinal”

    4. along with passed in extended_feature_types

  • extended_feature_types (list of str, optional) – (Optional) If a list is passed in, the feature types specified in the list will be considered as valid features.

Return type:

None

howso.utilities.validate_list_shape(values, dimensions, variable_name, var_types, allow_none=True)#

Validate the shape of a list.

Raise a ValueError if it does not match expected number of dimensions.

Parameters:
  • values (Collection or None) – A single or multidimensional list.

  • dimensions (int) – The number of dimensions the list should be.

  • variable_name (str) – The variable name for output.

  • var_types (str) – The expected type of the data.

  • allow_none (bool, default True) – If None should be allowed.

Return type:

None