ŷhat

Example Twitter Feed

For our example use case we'll make a model that predicts how many retweets and favorites a tweet will get based on the url that's being shared (we'll make the assumption that all of our tweets have a url).

Getting the data / ŷhat's Twitter feed

We're going to use ŷhat's own twitter timeline for the example, but feel free to sub in your own! You can download the data here.

Building the model

We're going to use pandas and scikit-learn to build our classifier.

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

# load in our data and split it into a training and test set
df = pd.read_csv("tweets_with_html.csv")
df['total_favs_and_rts'] = df.favorite_count + df.retweet_count
df['liked_content'] = np.where(df.total_favs_and_rts > 1, 1, 0)

df['istrain'] = np.random.uniform(size=len(df)) <= 0.8
train = df[df.istrain]
test  = df[-df.istrain]

vec = TfidfVectorizer(max_features=200)
train_twitter_tfidf = vec.fit_transform(train.text)

# create and train a classifier
nbayes = MultinomialNB(fit_prior=False)
nbayes.fit(train_twitter_tfidf, train.liked_content.tolist())

# prep the test data, then create a confusion matrix to examine the results
test_twitter_tfidf = vec.transform(test.text)
preds = nbayes.predict(test_twitter_tfidf)
print pd.crosstab(test.liked_content, preds)

Wrap it in ŷhat

Now that we have a model we're going to use the ŷhat Python Client to deploy it as a REST, streaming, and batch API. To do this, we're going to extend the ŷhat YhatModel class to make a custom model. Think of a YhatModel as a data pipeline with individual steps.

Defining execute

execute is where you write your code. You can use any functions from your script to operate on the data. In this example, we're going to be invoking our nbayes model and then formatting the response into a dictionary. We have the incoming request formatted as:

{
    "url": "http://blog.yhathq.com/posts/ggplot-for-python.html",
    "tweet_content": "Analytical projects often begin w/ exploration--namely plotting distributions to find patterns of interest and importance. Andwhile there are dozens of reasons to add R and Python to your toolbox, it was the supperior visualization faculties that spurred my own investment in these tools..."
}

preprocess input and output format

preprocess lets you define the structure of your input and output. Yhat will format your output as such. Currently we support dictionary, dict, and pandas DataFrame, DataFrame.

from yhat import Yhat, YhatModel , preprocess

class TwitterRanker(YhatModel):

    @preprocess(in_type=dict, out_type=dict) 
    def execute(self, data):
        tweet = data['tweet_content']
        data = vec.transform([tweet])
        pred = nbayes.predict(data)
        prob = nbayes.predict_proba(data)
        prob = {
            "ham": round(prob[0][0], 4),
            "spam": 1 - round(prob[0][0], 4)
        }
        return { "pred": pred[0], "prob": prob }

Testing your Model Locally

Now that you've created a TwitterRanker class, you can test it locally. The run method will open a session where you can pass data to your model and see the results.

>>> TwitterRanker().run()

# Paste your test data here
# Data should be formatted as valid JSON.
# Hit <ENTER> to execute your model
# Press <CTRL + C> or type 'quit' to exit
# ========================================

[In]  { "url": "http://blog.yhathq.com/posts/ggplot-for-python.html", "tweet_content": "Analytical projects often begin w/ exploration--namely plotting distributions to find patterns of interest and importance. Andwhile there are dozens of reasons to add R and Python to your toolbox, it was the supperior visualization faculties that spurred my own investment in these tools..." }

[Out]  {'pred': 1, 'prob': {'ham': 0.4088, 'spam': 0.5912}}

Note: The above json object was inserted as one line. For local testing, the objects passed should be on one line.

Deploy

The hard part is over. Now that we have our own model class, TwitterRanker, it's time to deploy it!

Create a connection to the ŷhat server using the Yhat class and passing the url of your server as the 3rd argument. Execute the deploy method of Yhat and pass in your class name and your global enviornment. Passing globals() is required for Yhat to be able to parse your global enviornment for requirements, functions, and variables that allow your model to run.

yh = Yhat("YOUR_USERNAME", "YOUR_APIKEY", "http://cloud.yhathq.com/")
yh.deploy ("twitterRanker", TwitterRanker, globals())
# {"status": "success"}

Download this model

You can download the full file here.