Derived Features#
Objectives & Takeaways#
- Definitions & an understanding of how to use derived features as well as which situations or data derived features are appropriate for. 
Prerequisite#
- You’ve successfully installed Howso Engine 
- You have an understanding of Howso’s basic workflow 
Data#
The dataset for this recipe highlights one of the common use-cases for derived features
and can be downloaded here. This dataset
consists of a start time, and end time, and a duration column. We will use derived features
to ensure that the end time is equal to the start time plus the duration.
Concepts & Terminology#
How-To guide#
Here we will define a derived feature and then react to the dataset. This will ensure that the features maintain their relationships.
Load Data#
First, we load the data using Pandas. Note that the data are stored as a Parquet file in order to preserve the datetime data types.
# These are the necessary imports for this user guide:
import datetime
import pandas as pd
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
# Load in the data using pandas
df = pd.read_parquet('data/dates_generated.parquet')
df
Define Derived Feature Code#
Derived features use code that is similar to howsoai/amalgam to define a relationship. Then, rather than predicting, the feature will be derived according to that code.
To do this, we create a partial feature attributes dictionary which will be fed to
infer_feature_attributes(). In the partial feature attributes
dictionary, we define the derived feature code which instructs Engine in how to derive
the end feature as a function of the start and duration features.
partial_features = {
    'end': {
        'derived_feature_code': "(+ (call value {feature \"start\"}) (call value {feature \"duration\"}))",
    }
The derived feature code that we use, (+ (call value {feature "start"}) (call value {feature "duration"}))
instructs Engine to add feature values of duration to start.
Map Data#
Now we can use infer_feature_attributes() to understand the properties
and characteristics of the data.
features = infer_feature_attributes(df, features=partial_features)
By supplying the partial feature attributes we defined in step 2, the derived feature code will
be populated for the end feature.
Train and Analyze#
Here the original data are trained into Howso Engine, so that it understands relationships between all data points.
trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze()
React#
Here we perform a generative react to generate 5 cases.
reaction = trainee.react(
    action_features=['start', 'end', 'duration'],
    derived_action_features=['end'],
    desired_conviction=5,
    generate_new_cases='no',
    num_cases_to_generate=5,
)
synth_df = reaction['action']
synth_df['end'] = synth_df.end.apply(
    lambda x: datetime.datetime.fromtimestamp(x)
)
The derived_action_features parameter instructs Engine to derive the end feature rather than generating it.
Finally, we can validate that the derivation behaved as expected:
for i, row in synth_df.iterrows():
    assert row.start + pd.to_timedelta(row.duration, unit='s') == row.end
Complete Code#
The code from all of the steps in this guide is combined below:
# These are the necessary imports for this user guide:
import datetime
import pandas as pd
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
# Load in the data using pandas
df = pd.read_parquet('data/dates_generated.parquet')
df
trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze()
reaction = trainee.react(
    action_features=['start', 'end', 'duration'],
    derived_action_features=['end'],
    desired_conviction=5,
    generate_new_cases='no',
    num_cases_to_generate=5,
)
synth_df = reaction['action']
synth_df['end'] = synth_df.end.apply(
    lambda x: datetime.datetime.fromtimestamp(x)
)
for i, row in synth_df.iterrows():
    assert row.start + pd.to_timedelta(row.duration, unit='s') == row.end