Example Random Forest

Predicting the Quality of Wines

We're going to build a webapp to predict the quality of a bottle of red wine. For info on the dataset, visit the Machine Learning Data Repository. Download the data.

Required Packages

We're going to use the RandomForestClassifier from sklearn for feature selection.

from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np

df = pd.read_csv("winequality-red.csv", sep=";")

Creating a Training/Test Set

We're going to randomly split our data into 75/25 train/test sets.

df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
train = df[df['is_train']==True]
test = df[df['is_train']==False]

Fitting a Model and Evaluating Fit

We're going to retrain our model using only the best features and then evaluate the effectiveness of the model. The features we are using are: sulphates, density, volatile acidity, & total sulfur dioxide.

features = ['sulphates', 'density', 'volatile acidity', 'total sulfur dioxide']
clf = RandomForestClassifier(n_jobs=2)
clf.fit(train[features], train['quality'])

Prepping for Deployment

Deploying to ŷhat is simple in this case. preprocess lets you define the structure of your input and output. Yhat will format your output as such. Currently we support dictionary, dict, and pandas DataFrame, DataFrame.

from yhat import Yhat, YhatModel , preprocess

class WinePredictor(YhatModel):

    @preprocess(in_type=pd.DataFrame, out_type=pd.DataFrame)
    def execute(self, data):
        result = clf.predict(data[features])
        df = pd.DataFrame(data={'predicted_quality': result})
        return df

Testing your Model Locally

Now that you've created a WinePredictor class, you can test it locally. The run method will open a session where you can pass data to your model and see the results.

>>> top_item = test[features].head(1)
>>> testcase = top_item.to_dict("list")
>>> json.dumps(testcase)
# '{"sulphates": [0.56000000000000005], "volatile acidity": [0.69999999999999996], "total sulfur dioxide": [34.0], "density": [0.99780000000000002]}'
>>> WinePredictor().run(json.dumps(testcase))

# Paste your test data here
# Data should be formatted as valid JSON.
# Hit <ENTER> to execute your model
# Press <CTRL + C> or type 'quit' to exit
# ========================================

[In] {"sulphates": [0.56000000000000005], "volatile acidity": [0.69999999999999996], "total sulfur dioxide": [34.0], "density": [0.99780000000000002]}

0     5

[1 rows x 1 columns]

Note: The above json object was inserted as one line. For local testing, the objects passed should be on one line.

Deploying to ŷhat

The actual deployment is totally painless.

yh = Yhat("YOUR_USERNAME", "YOUR_APIKEY", "http://cloud.yhathq.com/")
yh.deploy  ("winePredictor", WinePredictor, globals())
# {"status": "success"}