Utilities Package#
from howso.utilities import ...
This module contains various utilities for the Howso clients.
- exception howso.utilities.StopExecution#
Bases:
Exception
Raise a StopExecution as this is a cleaner exit() for Notebooks.
- class howso.utilities.FeatureAttributesBase(feature_attributes, params={}, unsupported=[])#
Bases:
dict
Provides accessor methods for and dict-like access to inferred feature attributes.
- get_names(*, types=None, without=None)#
Get feature names associated with this FeatureAttributes object.
- Parameters:
types (String, Container (of String), default None) – (Optional) A feature type as a string (E.g., ‘continuous’) or a list of feature types to limit the output feature names.
without (Iterable of String) – (Optional) An Iterable of feature names to exclude from the return object.
- Returns:
A list of feature names.
- Return type:
List of String
- get_parameters()#
Get the keyword arguments used with the initial call to infer_feature_attributes.
- Returns:
A dictionary containing the kwargs used in the call to infer_feature_attributes.
- Return type:
Dict
- to_json()#
Get a JSON string representation of this FeatureAttributes object.
- Returns:
A JSON representation of the inferred feature attributes.
- Return type:
String
- abstract validate(coerce=False, raise_errors=False, validate_bounds=True, allow_missing_features=False, localize_datetimes=True)#
Validate the given data against this FeatureAttributes object.
Check that feature bounds and data types loosely describe the data. Optionally attempt to coerce the data into conformity. :type data:
Any
:param data: The data to validate :type data: Any :type coerce: :param coerce: Whether to attempt to coerce DataFrame columns into correct data types. Coerceddatetimes will be localized to UTC.
- Parameters:
raise_errors (bool (default False)) – If True, raises a ValueError if nonconforming columns are found; else issue a warning
validate_bounds (bool (default True)) – Whether to validate the data against the attributes’ inferred bounds
allow_missing_features (bool (default False)) – Allows features that are missing from the DataFrame to be ignored
localize_datetimes (bool (default True)) – Whether to localize datetime features to UTC.
- Returns:
None or the coerced DataFrame if ‘coerce’ is True and there were no errors.
- Return type:
None | DataFrame
- class howso.utilities.FeatureType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#
Bases:
Enum
Feature type enum.
- class howso.utilities.LocaleOverride(language_code, encoding=None, category=6)#
Bases:
object
Implements a thread-safe context manager for switching locales temporarily.
Background#
Python’s locale.setlocale() is not thread safe. In order to work with alternate locales temporarily, this ContextDecorator will use a thread lock on __enter__ and release said lock on __exit__.
Important Notes#
All other threads will be blocked within the scope of the context. It is important to avoid time-consuming execution inside.
Example Usage#
>>> # Parse date string from French and format it in English. >>> >>> # System locale is 'en-us' (in this example) >>> from datetime import datetime >>> dt_format = '<some format>' >>> dt_obj = datetime() >>> with locale_override('fr-fr', category=locale.LC_DATE): >>> # We're in French date-formatting zone here... >>> date_obj = datetime.strptime(dt_value, dt_format) >>> >>> # Back in the 'en-us' locale again. >>> dt_value = dt_obj.strftime(dt_format)
- type language_code:
- param language_code:
- A language code /usually/ given as either:
2 lower case letters for the base language Ex: fr for French.
5 characters such as fr_CA where the first 2 designate the base language (French in this example) followed by an _ followed by 2 upper case characters designating the country- specific dialect (Canada, in this example). This example designates the French-Canadian locale.
Any of the above, plus an optional encoding following a ‘.’ Ex: fr_FR.UTF-8
- type language_code:
str
- type encoding:
- param encoding:
An encoding such as ‘UTF-8’ or ‘ISO8859-1’, etc. If not provided and there is no embedded encoding within the language_code parameter, ‘UTF-8’ is used. If an encoding is embedded in the language_code parameter and an explicit encoding provided here, the embedded encoding is dropped and ignored.
- type encoding:
str
- type category:
- param category:
This is one of the constants set within the locale object. See: https://docs.python.org/3.9/library/locale.html for details. locale.LC_ALL is used if nothing provided.
- type category:
int
- restore()#
Restore the original locale and release the thread lock.
Use this method directly to restore the current context when not using this class as a context manager.
- setup()#
Set a thread lock and the locale as desired.
Use this method directly to setup a locale context when not using this class as a context manager.
- class howso.utilities.MultiTableFeatureAttributes(feature_attributes, params={}, unsupported=[])#
Bases:
FeatureAttributesBase
A dict-like object containing feature attributes for multiple tables.
- class howso.utilities.ProgressTimer(total_ticks=100, *, start_tick=0)#
Bases:
Timer
Monitor progress of a task.
- Parameters:
total_ticks (int, default 100) – The total number of ticks in the progress meter.
start_tick (int, default 0) – The starting tick.
- reset()#
Reset the progress timer.
- Return type:
None
- start()#
Start the progress timer.
- Return type:
- update(ticks=1)#
Update the progress by given ticks.
- Parameters:
ticks (int, default 1) – The number of ticks to increment/decrement by.
- Return type:
None
- property is_complete: bool#
If progress has reached completion.
- property progress: float#
The current progress percentage.
- property tick_duration: timedelta | None#
The duration since the last tick.
- Returns:
The duration since the last tick, or None if not yet started.
- Return type:
timedelta or None
- property time_remaining: timedelta#
The estimated time remaining.
- Returns:
The time estimated to be remaining.
- Return type:
timedelta
- Raises:
ValueError – If timer not yet started.
- class howso.utilities.SingleTableFeatureAttributes(feature_attributes, params={}, unsupported=[])#
Bases:
FeatureAttributesBase
A dict-like object containing feature attributes for a single table or DataFrame.
- has_unsupported_data(feature_name)#
Returns whether the given feature has data that is unsupported by Howso Engine.
- Parameters:
feature_name (str) – The feature to check.
- Returns:
Whether feature_name was determined to have unsupported data.
- Return type:
bool
- validate(**kwargs)#
Validate the given single table data against this FeatureAttributes object.
Check that feature bounds and data types loosely describe the data. Optionally attempt to coerce the data into conformity. :type data:
Any
:param data: The data to validate (single table only). :type data: Any :param coerce: Whether to attempt to coerce DataFrame columns into correct data types. :type coerce: bool (default False) :param raise_errors: If True, raises a ValueError if nonconforming columns are found; else, issue a warning. :type raise_errors: bool (default False) :param validate_bounds: Whether to validate the data against the attributes’ inferred bounds. :type validate_bounds: bool (default True) :param allow_missing_features: Allows features that are missing from the DataFrame to be ignored. :type allow_missing_features: bool (default False) :param localize_datetimes: Whether to localize datetime features to UTC. :type localize_datetimes: bool (default True)- Returns:
None or the coerced DataFrame if ‘coerce’ is True and there were no errors.
- Return type:
None | DataFrame
- class howso.utilities.Timer#
Bases:
object
Simple context manager to capture run duration of the inner context.
Usage:
with Timer() as my_timer: # perform time-consuming task here... print(f"The task took {my_timer.duration}."
Results in:
"The task took 1:30:10.454419"
- end()#
End the timer.
- Return type:
None
- reset()#
Reset the timer.
- Return type:
None
- property duration: timedelta | None#
The total computed duration of the timer.
- Returns:
The total duration of the timer. When the timer has not yet ended, the duration between now and when the timer started will be returned. If the timer has not yet started, returns None.
- Return type:
timedelta or None
- property has_ended: bool#
If the timer has ended.
- property has_started: bool#
If the timer has started.
- property seconds: float | None#
The total seconds representing the duration of timer instance.
- class howso.utilities.UserFriendlyExit(verbose=False)#
Bases:
object
Return a callable that, when called, simply prints msg and cleanly exits.
- Parameters:
verbose (bool) – If True, emit more information
- howso.utilities.align_data(x, y=None)#
Check and fix type problems with the data and reshape it.
x is a Matrix and y is a vector.
- Parameters:
x (numpy.ndarray) – Feature values ndarray.
y (numpy.ndarray, default None) – Target values ndarray.
- Return type:
numpy.ndarray, numpy.ndarray or numpy.ndarray
- howso.utilities.build_react_series_df(react_series_response, series_index=None)#
Build a DataFrame from the response from react_series.
If series_index is set, use that as a name for an additional feature that will be the series index.
- Parameters:
react_series_response (Dictionary) – The response dictionary from a call to react_series.
series_index (String) – The name of the series index feature, which will index each series in the form ‘series_<idx>’, e.g., series_1, series_1, …, series_n. If None, does not include the series index feature in the returned DataFrame.
- Returns:
A Pandas DataFrame defined by the action features and series data in the react_series response. Optionally includes a series index feature.
- Return type:
pd.DataFrame
- howso.utilities.check_feature_names(features, expected_feature_names, raise_error=False)#
Check if features in features dict matches expected_feature_names.
- Parameters:
features (Mapping) – A feature dictionary that maps feature names to its attributes.
expected_feature_names (Collection) – A list (or a set) of expected column names in the given features dictionary.
raise_error (bool, defaults to False) – Raise a value error in case the feature names doesn’t match between features and expected_feature_names.
- Returns:
Returns True if the feature names in features matches the expected feature names passed via expected_feature_names. Otherwise, returns False.
- Return type:
bool
- Raises:
If raise_error is True, raises ValueError to indicate that –
the feature names in features dict doesn't match the feature names –
expected_feature_names –
- howso.utilities.date_format_is_iso(f)#
Check if datetime format is ISO8601.
Does format match the iso8601 set that can be handled by the C parser? Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different but must be consistent. Leading 0s in dates and times are optional.
Sourced from Pandas: pandas-dev/pandas
- howso.utilities.date_to_epoch(date_obj, time_format)#
Convert date into epoch (i.e seconds counted from Jan 1st 1970).
Note
If date_str is None or nan, it will be returned as is.
- Parameters:
date_obj (str or datetime.date or datetime.time or datetime.datetime) – Time object.
time_format (str) – Specify format of the time. Ex:
%a %b %d %H:%M:%S %Y
- Returns:
The epoch date as a floating point value or ‘np.nan’, et al.
- Return type:
Union[str, float]
- howso.utilities.deep_update(base, updates)#
Update dict base with updates from dict updates in a “deep” fashion.
NOTE: This is a recursive function. Care should be taken to ensure that neither of the input dictionaries are self-referencing.
- Parameters:
base (dict) – A dictionary
updates (dict) – A dictionary of updates
- Returns:
dict
- Return type:
The updated dictionary.
- howso.utilities.deserialize_cases(data, columns, features=None)#
Deserialize case data into a DataFrame.
If feature attributes contain original typing information, columns will be converted to the same data type as original training cases.
- Parameters:
data (list of list or list of dict) – The context data.
columns (list of str) –
The case column mapping.
The order corresponds to how the data will be mapped to columns in the output. Ignored for list of dict where the dict key is the column name.
features (dict, default None) –
(Optional) The dictionary of feature name to feature attributes.
If not specified, no column typing will be attempted.
- Returns:
The deserialized data.
- Return type:
pandas.DataFrame
- howso.utilities.determine_iso_format(str_date, fname)#
Determine which specific ISO8601 format the passed in date is in.
Specifically if it’s just a date, if it’s zoned, and if zoned, whether it’s a zone or an offset.
- Parameters:
str_date (str) – The Date time passed in as a string.
fname (str) – Name of feature to guess bounds for.
- Returns:
The ISO_8601 format string that most matches the passed in date.
- Return type:
str
- howso.utilities.dprint(debug, *argc, **kwargs)#
Print based on debug levels.
- Parameters:
debug (bool or int) – If true, user_debug level would be 1. Possible levels: 1, 2, 3 (print all)
kwargs –
- default_priorityint, default 1
The message is printed only if the debug >= default_priority.
Examples
>>> dprint(True, "hello", "howso", priority=1) `hello howso`
- howso.utilities.epoch_to_date(epoch, time_format, tzinfo=None)#
Convert epoch to date if epoch is not None or nan else, return as it is.
- Parameters:
epoch (Union[str, float]) – The epoch date as a floating point value (or str if np.nan, et al)
time_format (str) – Specify format of the time. Ex:
%a %b %d %H:%M:%S %Y
tzinfo (datetime.tzinfo, optional) – Time zone information to include in datetime.
- Returns:
A date string in the format similar to “Wed May 21 00:00:00 2008”
- Return type:
str
- howso.utilities.format_dataframe(df, features)#
Format DataFrame columns to original type using feature attributes.
Note
Modifies DataFrame in place.
- Parameters:
df (pandas.DataFrame) – The DataFrame to format columns of.
features (Dict) – The dictionary of feature name to feature attributes.
- Returns:
The formatted data.
- Return type:
pandas.DataFrame
- howso.utilities.get_kwargs(kwargs, descriptors, warn_on_extra=False)#
Decompose kwargs into a tuple of return values.
Each tuple corresponds to a descriptor in ‘descriptors’. Optionally issue a warning on any items in kwargs that are not “consumed” by the descriptors.
- Parameters:
kwargs (dict) – Mapping of keys and values (kwargs)
descriptors –
An iterable of descriptors for how to handle each item in kwargs. Each descriptor can be a mapping, another iterable, or a single string.
If a mapping, it must at least include the key: ‘key’ but can also optionally include the keys: ‘default’ and ‘test’.
If a non-mapping iterable, the values will be interpreted as ‘key’ ‘default’, ‘test, in that order. Only the first is absolutely required the remaining will be evaluated to None if not provided.
If a string provided, it is used as the ‘key’. ‘default’ and ‘test are set to None.
If a ‘key’ is not found in the kwargs, then the ‘default’ value is returned.
If a descriptor contains a ‘test’, it should be a callable that returns a boolean. If False, the ‘default’ value is returned.
If the ‘default’ provided is an instance of an Exception, then, the exception is raised when the ‘key’ is not present, or the ‘test’ fails.
warn_on_extra (bool) – If True, will issue warnings about any keys provided in kwargs that were not consumed by the descriptors. Default is False
- Returns:
A tuple of the found values in the same order as the provided descriptor.
- Raises:
May raise any exception given as a 'default' in the –
descriptors –
Usage#
An example of usage showing various ways to use descriptors:
>>> def my_method(self, required, **kwargs): >>> apple, banana, cherry, durian, elderberry = get_kwargs(kwargs, ( >>> # A simple string is interpreted as the 'key' with 'default of >>> # `None` and no test. Very common use-case made simple. >>> 'apple', >>> >>> # Another common use-case. Set value to 5 if not in kwargs. >>> # This also shows using an tuple for the descriptor. >>> ('banana', 5), >>> >>> # Verbose input including a test using dict >>> {'key': 'cherry', 'default': 5, 'test': lambda x: x > 0}, >>> >>> # The test, `is_durian`, is defined elsewhere >>> ('durian', None, is_durian), >>> >>> # Full example using iterable descriptor rather than mapping. >>> ('elderberry', ValueError('"elderberry" must be > 5.'), >>> lambda x: x > 5), >>> ))
- howso.utilities.get_matrix_diff(matrix)#
Calculates the absolute value of a matrix for feature pairs.
- Parameters:
matrix (DataFrame) – The matrix in DataFrame format.
- Returns:
Sorted dictionary of absolute differences between the feature value pairs. The values are stored in a dictionary with keys consisting of a tuple of the features.
- Return type:
dict
- howso.utilities.infer_feature_attributes(data, *, tables=None, time_feature_name=None, **kwargs)#
Return a dict-like feature attributes object with useful accessor methods.
The returned object is a subclass of FeatureAttributesBase that is appropriate for the provided data type.
- Parameters:
data (Any) – The data source to infer feature attributes from. Must be a supported data type.
tables (Iterable of TableNameProtocol) –
(Optional, required for datastores) An Iterable of table names to infer feature attributes for.
If included, feature attributes will be generated in the form
{table_name: {feature_attribute: value}}
.time_feature_name (str, default None) – (Optional, required for time series) The name of the time feature.
features (dict or None, default None) –
(Optional) A partially filled features dict. If partially filled attributes for a feature are passed in, those parameters will be retained as is and the rest of the attributes will be inferred.
- For example:
>>> from pprint import pprint >>> df.head(2) ... sepal-length sepal-width petal-length petal-width target ... 0 6.7 3.0 5.2 2.3 2 ... 1 6.0 2.2 5.0 1.5 2 >>> # Partially filled features dict >>> partial_features = { ... "sepal-length": { ... "type": "continuous", ... 'bounds': { ... 'min': 2.72, ... 'max': 3, ... 'allow_null': True ... }, ... }, ... "sepal-width": { ... "type": "continuous" ... } ... } >>> # Infer rest of the attributes >>> features = infer_feature_attributes( ... df, features=partial_features ... ) >>> # Inferred Feature dictionary >>> pprint(features) ... { ... 'sepal-length', { ... 'bounds': { ... 'allow_null': True, 'max': 3, 'min': 2.72 ... }, ... 'type': 'continuous' ... }, ... 'sepal-width', { ... 'bounds': { ... 'allow_null': True, 'max': 7.38905609893065, ... 'min': 1.0 ... }, ... 'type': 'continuous' ... }, ... 'petal-length', { ... 'bounds': { ... 'allow_null': True, 'max': 7.38905609893065, ... 'min': 1.0 ... }, ... 'type': 'continuous' ... }, ... 'petal-width', { ... 'bounds': { ... 'allow_null': True, 'max': 2.718281828459045, ... 'min': 0.049787068367863944 ... }, ... 'type': 'continuous' ... }, ... 'target', { ... 'bounds': {'allow_null': True}, ... 'type': 'nominal' ... } ... }
Note that valid ‘data_type’ values for both nominal and continuous types are: ‘string’, ‘number’, ‘json’, ‘amalgam’, and ‘yaml’. The ‘boolean’ data_type is valid only when type is nominal. ‘string_mixable’ is valid only when type is continuous (predicted values may result in interpolated strings containing a combination of characters from multiple original values).
infer_bounds (bool, default True) – (Optional) If True, bounds will be inferred for the features if the feature column has at least one non NaN value
datetime_feature_formats (dict, default None) –
(Optional) Dict defining a custom (non-ISO8601) datetime format and an optional locale for features with datetimes. By default datetime features are assumed to be in ISO8601 format. Non-English datetimes must have locales specified. If locale is omitted, the default system locale is used. The keys are the feature name, and the values are a tuple of date time format and locale string.
Example:
{ "start_date": ("%Y-%m-%d %A %H.%M.%S", "es_ES"), "end_date": "%Y-%m-%d" }
delta_boundaries (dict, default None) –
(Optional) For time series, specify the delta boundaries in the form {“feature” : {“min|max” : {order : value}}}. Works with partial values by specifying only particular order of derivatives you would like to overwrite. Invalid orders will be ignored.
Examples:
{ "stock_value": { "min": { '0' : 0.178, '1': 3.4582e-3, '2': None } } }
derived_orders (dict, default None) – (Optional) Dict of features to the number of orders of derivatives that should be derived instead of synthesized. For example, for a feature with a 3rd order of derivative, setting its derived_orders to 2 will synthesize the 3rd order derivative value, and then use that synthed value to derive the 2nd and 1st order.
lags (list or dict, default None) –
(Optional) A list containing the specific indices of the desired lag features to derive for each feature (not including the series time feature). Specifying derived lag features for the feature specified by time_feature_name must be done using a dictionary. A dictionary can be used to specify a list of specific lag indices for specific features. For example: {“feature1”: [1, 3, 5]} would derive three different lag features for feature1. The resulting lag features hold values 1, 3, and 5 timesteps behind the current timestep respectively.
Note
Using the lags parameter will override the num_lags parameter per feature
Note
A lag feature is a feature that provides a “lagging value” to a case by holding the value of a feature from a previous timestep. These lag features allow for cases to hold more temporal information.
num_lags (int or dict, default None) –
(Optional) An integer specifying the number of lag features to derive for each feature (not including the series time feature). Specifying derived lag features for the feature specified by time_feature_name must be done using a dictionary. A dictionary can be used to specify numbers of lags for specific features. Features that are not specified will default to 1 lag feature.
Note
The num_lags parameter will be overridden by the lags parameter per feature.
orders_of_derivatives (dict, default None) – (Optional) Dict of features and their corresponding order of derivatives for the specified type (delta/rate). If provided will generate the specified number of derivatives and boundary values. If set to 0, will not generate any delta/rate features. By default all continuous features have an order value of 1.
rate_boundaries (dict, default None) –
(Optional) For time series, specify the rate boundaries in the form {“feature” : {“min|max” : {order : value}}}. Works with partial values by specifying only particular order of derivatives you would like to overwrite. Invalid orders will be ignored.
Examples:
{ "stock_value": { "min": { '0' : 0.178, '1': 3.4582e-3, '2': None } } }
tight_bounds (Iterable of str, default None) – (Optional) Set tight min and max bounds for the features specified in the Iterable.
tight_time_bounds (bool, default False) – (optional) If True, will set tight bounds on time_feature. This will cause the bounds for the start and end times set to the same bounds as observed in the original data.
time_feature_is_universal (bool, optional) – If True, the time feature will be treated as universal and future data is excluded while making predictions. If False, the time feature will not be treated as universal and only future data within the same series is excluded while making predictions. It is recommended to set this value to True if there is any possibility of global relevancy of time, which is the default behavior.
time_series_type_default (str, default 'rate') – (Optional) Type specifying how time series is generated. One of ‘rate’ or ‘delta’, default is ‘rate’. If ‘rate’, it uses the difference of the current value from its previous value divided by the change in time since the previous value. When ‘delta’ is specified, just uses the difference of the current value from its previous value regardless of the elapsed time.
time_series_types_override (dict, default None) – (Optional) Dict of features and their corresponding time series type, one of ‘rate’ or ‘delta’, used to override time_series_type_default for the specified features.
mode_bound_features (list of str, default None) – (Optional) Explicit list of feature names to use mode bounds for when inferring loose bounds. If None, assumes all features. A mode bound is used instead of a loose bound when the mode for the feature is the same as an original bound, as it may represent an application-specific min/max.
id_feature_name (str or list of str, default None) – (Optional) The name(s) of the ID feature(s).
time_invariant_features (list of str, default None) – (Optional) Names of time-invariant features.
attempt_infer_extended_nominals (bool, default False) –
(Optional) If set to True, detections of extended nominals will be attempted. If the detection fails, the categorical variables will be set to int-id subtype.
Note
Please refer to kwargs for other parameters related to extended nominals.
nominal_substitution_config (dict of dicts, default None) – (Optional) Configuration of the nominal substitution engine and the nominal generators and detectors.
include_extended_nominal_probabilities (bool, default False) – (Optional) If true, extended nominal probabilities will be appended as metadata into the feature object.
datetime_feature_formats –
(optional) Dict defining a custom (non-ISO8601) datetime format and an optional locale for columns with datetimes. By default datetime columns are assumed to be in ISO8601 format. Non-English datetimes must have locales specified. If locale is omitted, the default system locale is used. The keys are the column name, and the values are a tuple of date time format and locale string:
Example:
{ "start_date" : ("%Y-%m-%d %A %H.%M.%S", "es_ES"), "end_date" : "%Y-%m-%d" }
ordinal_feature_values (dict, default None) –
(optional) Dict for ordinal string features defining an ordered list of string values for each feature, ordered low to high. If specified will set ‘type’ to be ‘ordinal’ for all features in this map.
Example:
{ "grade" : [ "F", "D", "C", "B", "A" ], "size" : [ "small", "medium", "large", "huge" ] }
dependent_features (dict, default None) –
(Optional) Dict of features with their respective lists of features that either the feature depends on or are dependent on them. Should be used when there are multi-type value features that tightly depend on values based on other multi-type value features.
- Examples:
If there’s a feature name ‘measurement’ that contains measurements such as BMI, heart rate and weight, while the feature ‘measurement_amount’ contains the numerical values corresponding to the measurement, dependent features could be passed in as follows:
{ "measurement": [ "measurement_amount" ] }
Since dependence directionality is not important, this will also work:
{ "measurement_amount": [ "measurement" ] }
- Returns:
A subclass of FeatureAttributesBase (Single/MultiTableFeatureAttributes) that extends dict, thus providing dict-like access to feature attributes and useful accessor methods.
- Return type:
Examples
# 'data' is a DataFrame >> attrs = infer_feature_attributes(data) # Can access feature attributes like a dict >> attrs { "feature_one": { "type": "continuous", "bounds": {"allow_null": True}, }, "feature_two": { "type": "nominal", } } >> attrs["feature_one"] { "type": "continuous", "bounds": {"allow_null": True} } # Or can call methods to do other stuff >> attrs.get_parameters() {'type': "continuous"} # Now 'data' is an object that implements SQLRelationalDatastoreProtocol >> attrs = infer_feature_attributes(data, tables) >> attrs { "table_1": { "feature_one": { "type": "continuous", "bounds": {"allow_null": True}, }, "feature_two": { "type": "nominal", } }, "table_2" : {...}, } >> attrs.to_json() '{"table_1" : {...}}'
- howso.utilities.is_valid_uuid(value, version=4)#
Check if a given string is a valid uuid.
- Parameters:
value (str or UUID) – The value to test
version (int, optional) – The uuid version (Default: 4)
- Returns:
True if value is a valid uuid string
- Return type:
bool
- howso.utilities.matrix_processing(matrix, normalize=False, normalize_method='relative', ignore_diagonals_normalize=True, absolute=False, fill_diagonal=False, fill_diagonal_value=1)#
Preprocess a matrix including options to normalize, take the absolute value, and fill in the diagonals.
The order of operation for this method is first it then normalizes, then takes the absolute value, and lastly fills in the diagonals. This method automatically sorts the matrix indexes.
- Parameters:
matrix (Dataframe) – Matrix in Dataframe form.
normalize (bool, default False) – Whether to normalize the matrix row wise. Normalization method is set by the normalize_method parameter.
normalize_method (Union[Iterable[Union[str, Callable]], str, Callable], default 'relative') –
The normalization method. The method may either one of the strings below that correspond to a default method or a custom Callable.
These methods may be passed in as an individual string or in a iterable where they will be processed sequentially.
Default Methods: - ‘relative’: normalizes each row by dividing each value by the maximum absolute value in the row. - ‘fractional’: normalizes each row by dividing each value by the sum of absolute values in the row. - ‘feature_count’: normalizes each row by dividing by the feature count.
Custom Callable: - If a custom Callable is provided, then it will be passed onto the DataFrame apply function:
matrix.apply(Callable)
ignore_diagonals_normalize (bool, default True) – Whether to ignore the diagonals when normalizing the matrix. Useful for matrices where the diagonals are a constant value such as correlation matrices.
absolute (bool, default False) – Whether to transform the matrix values into the absolute values.
fill_diagonal (bool, default False) – Whether to fill in the diagonals of the matrix. If set to true, the diagonal values will be filled in based on the fill_diagonal_value value.
fill_diagonal_value (bool, default False) – The value to fill in the diagonals with. fill_diagonal must be set to True in order for the diagonal values to be filled in. If fill_diagonal is set to false, then this parameter will be ignored.
- Returns:
Dataframe of the result.
- Return type:
Dataframe
- howso.utilities.num_list_dimensions(lst)#
Return number of dimensions for a list.
Assumption is that the input nested lists are also lists, or a list of dataframes.
- Parameters:
lst (list) – The nested list of objects.
- Returns:
The number of dimensions in the passed in list.
- Return type:
int
- howso.utilities.replace_doublemax_with_infinity(dat)#
Replace values of Double.MAX_VALUE (1.79769313486232E+308) with Infinity.
For use when retrieving data from Howso.
- Parameters:
dat (A dict, list, number, or string)
- Return type:
A dict, list, number, or string - same as passed in for translation
- howso.utilities.replace_nan_with_none(dat)#
Replace None values with NaN values.
For use when feeding data to Howso from the scikit module to account for the different ways howso and sklearn represent missing values.
- Parameters:
dat (list of list of object) – A 2d list of values.
- Return type:
list[list[object]]
- howso.utilities.replace_none_with_nan(dat)#
Replace None values with NaN values.
For use when retrieving data from Howso via the scikit module to conform to sklearn convention on missing values.
- Parameters:
dat (list of dict of key-values)
- Return type:
list[dict]
- howso.utilities.reshape_data(x, y)#
Reshapes X as a matrix and y as a vector.
- Parameters:
x (np.ndarray) – Feature values ndarray.
y (np.ndarray) – target values ndarray.
- Returns:
X, y
- Return type:
np.ndarray, np.ndarray
- howso.utilities.seconds_to_time(seconds, *, tzinfo=None)#
Convert seconds to a time object.
- Parameters:
seconds (int or float) – The seconds to convert to time.
tzinfo (datetime.tzinfo, optional) – Time zone to use for resulting time object.
- Returns:
The time object.
- Return type:
datetime.time
- howso.utilities.serialize_cases(data, columns, features, *, warn=False)#
Serialize case data into list of lists.
- Parameters:
data (pandas.DataFrame or numpy.ndarray or list of list) – The data to serialize.
columns (list of str) – The case column mapping. The order corresponds to the order of cases in output.
features (dict) – The dictionary of feature name to feature attributes.
warn (bool, default False) – If warnings should be raised by serializer.
- Returns:
The serialized data from DataFrame.
- Return type:
list of list or None
- howso.utilities.serialize_datetimes(cases, columns, features, *, warn=False)#
Serialize datetimes in the given list of cases, in-place.
Iterate over the passed in case values and serializes any datetime values according to the specified datetime format in feature attributes.
- Parameters:
cases (list of list) – A 2d list of case values corresponding to the features of the cases.
columns (list of str) – A list of feature names.
features (dict) – Dictionary of feature attributes.
warn (bool, default: False) – If set to true, will warn user when specified datetime format doesn’t match the datetime strings.
- Return type:
None
- howso.utilities.time_to_seconds(time)#
Convert a time object to seconds since midnight.
- Parameters:
time (datetime.time) – The time to convert.
- Returns:
Seconds since midnight.
- Return type:
float
- howso.utilities.trainee_from_df(df, features=None, action_features=None, name=None, persistence='allow', trainee_metadata=None)#
Create a Trainee from a dataframe.
Assumes floats are continuous and all other values are nominal.
- Parameters:
df (pandas.DataFrame) – A pandas Dataframe with column names corresponding to feature names.Features that are considered to be continuous should have a dtype of float.
features (Optional[Mapping[str, Mapping]]) – (Optional) A dictionary of feature names to a dictionary of parameters.
action_features (List of String, Default None) – (Optional) List of action features. Anything that’s not in this list will be treated as a context feature. For example, if no action feature is specified the trainee won’t have a target.
name (str or None, defaults to None) – (Optional) The name of the trainee.
persistence (str: default "allow") – The persistence setting to use for the trainee. Valid values: “always”, “allow”, “never”.
trainee_metadata (Mapping, optional) – (Optional) mapping of key/value pairs of metadata for trainee.
- Returns:
A trainee object
- Return type:
howso.openapi.models.Trainee
- howso.utilities.validate_case_indices(case_indices, thorough=False)#
Validate the case_indices parameter to the react() method of a Howso client.
Raises a ValueError if case_indices has sequences that do not contain the expected data types of (str, int).
- Parameters:
case_indices (Iterable of Sequence[str, int]) – The case_indices argument to validate.
thorough (bool, default False) – Whether to verify the data types in all sequences or only some (for performance)
- Return type:
None
- howso.utilities.validate_datetime_iso8061(datetime_value, feature)#
Check that the passed in datetime value adheres to the ISO 8601 format.
Warn the user if it doesn’t check out.
- Parameters:
datetime_value (str) – The date value as a string
feature (str) – Name of feature
- howso.utilities.validate_features(features, extended_feature_types=None)#
Validate the feature types in features.
- Parameters:
features (dict) –
The dict of feature name to feature attributes.
The valid feature names are:
”nominal”
”continuous”
”ordinal”
along with passed in extended_feature_types
extended_feature_types (list of str, optional) – (Optional) If a list is passed in, the feature types specified in the list will be considered as valid features.
- Return type:
None
- howso.utilities.validate_list_shape(values, dimensions, variable_name, var_types, allow_none=True)#
Validate the shape of a list.
Raise a ValueError if it does not match expected number of dimensions.
- Parameters:
values (Collection or None) – A single or multidimensional list.
dimensions (int) – The number of dimensions the list should be.
variable_name (str) – The variable name for output.
var_types (str) – The expected type of the data.
allow_none (bool, default True) – If None should be allowed.
- Return type:
None