Predictions#

Objectives: what you will take away#

  • Definitions & an understanding of basic regression, classification, continuous vs categorical/nominal Action Features, Trainee, react(), react_aggregate().

  • How-To perform a basic regression or classification analysis using the Howso Engine to predict the Highway MPG or Fuel Type based on vehicle Context Features.

Prerequisites: before you begin#

Data#

Download 23,606 vehicles from 1984 - 2022, including make, model, MPG, drive-type, size, class and fuel type.

Concepts & Terminology#

Regression - is used to describe the relationship between one or more Context Features and a continuous numeric Action Feature, as in this guide predicting the Highway MPG of a vehicle based on its physical characteristics and year manufactured.

Classification - is used to describe the relationship between one or more Context Features and a categorical/nominal Action Feature, as in this guide predicting the FuelType of a vehicle based on its physical characteristics and year manufactured. For Howso Engine, the action feature may be left in string format and does not need to be converted to numeric format.

Trainee and React - In this simple example, we will be creating a Trainee that we can be used React to new case data, such as a new car we might be looking to build.

Train and Analyze - To create a Trainee, we will first load data, define Feature Attributes of the data and Train the Trainee. The Trainee can be used for many tasks, but because we know exactly what we want to do, we will Analyze to improve the performance of our trainee by defining the specific set of Context Features that we know we want to use to predict an Action Features. The action feature in this example will be Highway MPG.

Evaluating the Trainee - To understand the accuracy of the trainee for our tasks, we can use the built-in Trainee.react_aggregate(). Since we are not using a train-test split approach in this example, we will use the react_aggregate() method, which performs a react() on each of the cases that is trained into the model using a leave-one-out approach.

That method allows us to use prediction stats to evaluate regression accuracy statistics such as:

  • R-Squared - \(R^2\) is a value that represents how well the predictions fit the data, the closer to 1.0 the better the fit

  • Mean Absolute Error (MAE) average absolute error between actual and predicted values over the whole dataset, and relative to the scale of what is being measured

  • Root Mean Square Error (RMSE) mean square root of errors over whole dataset, similar to MAE and relative to scale of what is measured

Or classification metrics including those derived from the true positive (TP), true negative (TN), false positive (FP), false negative (FN) metrics:

  • Accuracy - Describes the model performance across all classes and is comprised of the ratio of number of correct predictions to the total number of predictions. - (TP+TN)/(TP+FP+FN+TN).

  • Precision - Describes what proportion of positive predictions were correct. - (TP+TN)/(TP+FP+FN+TN).

  • Recall - Describes what proportion of actual positives were predicted correctly. - (TN)/(TN+FP).

  • Mean Absolute Error (MAE) average absolute error between actual and predicted Categorical Action Probabilities (CAP) over the whole dataset. - CAP is the prediction probability for each class of the action feature.

React to New Cases - Lastly, we will simply request the Trainee to react() to new cases we present to it, giving us predictions of what the Highway MPG would be.

How-To Guide#

We want to predict the Highway MPG and the Fuel Type of a new vehicle based on a Trainee we create from the vehicles dataset. In this guide, we will directly show the code for Highway MPG prediction while including the code for Fuel Type as comments wherever the code differs.

Step 1 - Load Libraries#

[1]:
import pandas as pd
import matplotlib.pyplot as plt

from howso.engine import Trainee
from howso.utilities import infer_feature_attributes

Step 2 - Load Data#

Using a pandas DataFrame, load the vehicles dataset from the csv file. We are going to drop make/model features because that is kinda cheating… Make sure it’s what you expect, take a quick look at some of the data and use describe to make sure it has the shape you’d expect.

[2]:
df = pd.read_csv("../../_assets/vehicles.csv")
df = df.drop(['Make', 'Model'], axis=1)
df.describe()
[2]:
CityMPG HighwayMPG Year PassengerVolume LuggageVolume
count 23606.000000 23606.000000 23606.000000 23606.000000 23606.000000
mean 20.612683 27.742481 2002.669067 91.378633 16.023850
std 10.013758 8.719808 11.749509 11.229488 7.980971
min 6.000000 9.000000 1984.000000 1.000000 1.000000
25% 17.000000 24.000000 1992.000000 85.000000 12.000000
50% 19.000000 26.000000 2004.000000 91.000000 14.000000
75% 22.000000 30.000000 2013.000000 98.000000 17.000000
max 150.000000 133.000000 2022.000000 195.000000 55.000000

Step 3 - Define Features#

Howso can auto-detect features from data, using infer_feature_attributes() but it is a best practice to review and configure. In this tutorial, we will proceed as if the features were not detected as we want them to be, so we will make necessary adjustments.

Note

Howso automatically determines whether to perform a regression or classification task by the feature attributes of the action feature you are trying to predict, specifically the feature type as shown below, thus it is very important to make sure that the feature types are correct.

[3]:
# Auto detect features
features = infer_feature_attributes(df)

# For Regression, we will set `HighwayMPG` feature type to continuous
features['HighwayMPG']['type'] = 'continuous'

# For Classification, we will set `FuelType` feature type to nominal
features['FuelType']['type'] = 'nominal'

# We will also set these context features to continuous
features['CityMPG']['type'] = 'continuous'
features['Year']['type'] = 'continuous'
features['PassengerVolume']['type'] = 'continuous'
features['LuggageVolume']['type'] = 'continuous'

Step 4 - Create a Trainee, Train, and Analyze#

Next we will create a Trainee and train() based on data we have loaded into the DataFrame from the vehicles.csv.

[4]:
# Create a new Trainee, specify features
trainee = Trainee(features=features)

# Train trainee
trainee.train(df)
trainee.analyze()

Step 5 - Generate Accuracy Metrics#

Review the accuracy of the Trainee by using the built-in react_aggregate() method, which performs a react() on each of the cases that is trained into the model. Then we can evaluate accuracy with the returned R-Squared (\(R^2\)), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics since this is a regression task.

[5]:
# Recommended metrics
stats = trainee.react_aggregate(
    action_feature="HighwayMPG",
    details={
        'prediction_stats': True,
        'selected_prediction_stats': ['rmse', 'spearman_coeff', 'r2', 'mae']
    }
)
stats
[5]:
{'r2': {'PassengerVolume': 0.9758335480610507,
  'LuggageVolume': 0.9740611547080207,
  'Year': 0.922227785637783,
  'CityMPG': 0.9846006871276185,
  'HighwayMPG': 0.9812427590448011},
 'mae': {'FuelType': 0.11656934603140044,
  'PassengerVolume': 1.0510220286844068,
  'CityMPG': 0.7212370122031748,
  'Year': 2.158294170139518,
  'LuggageVolume': 0.7003791740629839,
  'DriveType': 0.15625106234158273,
  'VehicleClass': 0.034411853212467966,
  'HighwayMPG': 0.8385851534713563},
 'spearman_coeff': {'PassengerVolume': 0.9770519150761656,
  'LuggageVolume': 0.9559640364411679,
  'Year': 0.952809251870153,
  'CityMPG': 0.9670920300050873,
  'HighwayMPG': 0.9639733941439603},
 'rmse': {'PassengerVolume': 1.7572699629274784,
  'LuggageVolume': 1.2479238810344317,
  'Year': 3.2943853521834607,
  'CityMPG': 1.1630592302738443,
  'HighwayMPG': 1.14197390760248}}

Step 6 - Review Accuracy Metrics#

We see the Trainee has a very good fit for predicting Highway MPG with an \(R^2\) of 0.99, which shows the Trainee should be effective at predicting new cases of Highway MPG.

[6]:
data = {
    'Year': [2022],
    'DriveType': ['All-Wheel Drive'],
    'FuelType' : ['Premium'],
    'VehicleClass': ['Midsize Cars'],
    'CityMPG': [21],
    'PassengerVolume': [95],
    'LuggageVolume': [23]
}

test_case = pd.DataFrame(data)

result = trainee.react(
    test_case,
    action_features=["HighwayMPG"],
    context_features=features.get_names(without=["HighwayMPG"]),
)

Note

The method Trainee.predict() can also be used for predictions instead of Trainee.react(). Trainee.predict() serves as a convenience function that eliminates the extra output if all you want is the prediction.

Step 8 - Review Prediction#

Reviewing the prediction shows HighwayMPG of 29.

[7]:
result['action']
[7]:
HighwayMPG
0 29

API References#