Oct 15, 2021
River is the online machine learning package in Python. It can work for variety of machine learning problems whether it is regression or classification.
River consists of Creme and Scikit-Multiflow and is intended to work on the data that is continuously generated. This blog demonstrates a step by step procedure to use River on an example data set starting from its installation. The basics of River have been discussed in detail in my previous blog. If you haven’t read that, it is recommended to read that first before following this blog for better understanding.
Installation
Installing river is same as installing other python packages. You can use pip to install it in your environment.
pip install river
Functions
Let us take an example to build a simple text classification model using River. The aim is to classify the given text as either software or hardware. For this purpose, we will use BagOfWords() as transformer to convert text into features and Naive Bayes MultinomialNB to model the data. So, we first import the required packages.
from river.naive_bayes import MultinomialNB
from river.feature_extraction import BagOfWords,TFIDF
from river.compose import Pipeline
from river.metrics import ClassificationReport, Accuracy
Let us create a machine learning pipeline and also measure the performance of the model based on its accuracy. We can take a sample emotion data set here which is just a list of tuples consisting of the text and its associated label. The goal is to classify the text in one of the emotions; happy or sad. The data can also be ingested from a streaming source. The same can be done by reading data from a csv file, but in that case, it is required to convert the data frame to a dictionary using the usual Pandas package.
import pandas as pd
df = pd.read_csv("emotion_dataset.csv")
# Convert to Format
df.to_dict(records)
# Convert to Tuple
df.to_records(index=False)
Dataset
Here is the emotions dataset which we can use to follow along the steps:
data = [("My dreams have come true", "happy"),
("He is happy as a pig in mud", "happy"),
("Broke My Heart", "sad"),
("He jumped with joy", "happy"),
("Cut up about something", "sad"),
("Are you feeling down", "sad"),
("Friends, you made my day", "happy"),
("He walked away with heavy heart", "sad"),
("I'm so happy, I'm on cloud nine", "happy"),
("He is in sad state of mind", "sad"),
("You are on the of the world", "happy"),
("Why are you shedding tears", "sad"),
("She is walking over the moon", "happy"),
("She finds herself under the weather", "sad")]
Classification Model
We will now create a pipeline with two stages; the transformer and the estimator stage. Once the pipeline is build, we can visualize it using draw function.
pipe_nb = Pipeline(('vectorizer',BagOfWords(lowercase=True)),
('nb',MultinomialNB()))
pipe_nb.steps
As the data is obtained at runtime, we will fit the model on the data one at a time. For this purpose, we will use the .learn_one(x,y) function during training. We can simulate this via iterating through our data using a for loop.
for text,label in data:
pipe_nb = pipe_nb.learn_one(text,label)
Find Prediction Probability
To test the model on a sample input, we use the predict_one() function. This will return the class to which the sample input belongs. If you want to know the associated probability, you can also use the function predict_proba_one.
pipe_nb.predict_one('She is dancing with joy')
pipe_nb.predict_proba_one('She is dancing with joy')
Performance Measure of Model
Once, the model is trained, we will evaluate the model performance using functions from the river.metrics sub-module such as Accuracy, Classification Report, etc.
metric_accuracy = Accuracy()
for x,y in data:
y_predict_before = pipe_nb.predict_one(x)
# Check Acccuracy
metric_accuracy = metric_accuracy.update(y,y_predict_before)
pipe_nb = pipe_nb.fit_one(x,y)
print("Final Accuracy",metric_accuracy)
Consider the following sample data, which we will use to check how well our model has fit the data.
sample_data = ["You colour me happy",
"My friend cracked a smile to see me",
"He cried his eyes out",
“You are down in the dumps”]
y_actual = ["happy", "happy", "sad", "sad"]
Let’s loop through the sample data to predict its output.
y_predict = []
for i in sample_data:
result = pipe_nb.predict_one(i)
print("{}: Prediction: {}".format(i,result))
y_predict.append(result)
To obtain the result of classification in the form of classification report, we will use the above sample test dataset.
report = ClassificationReport()
for y_act, y_pred in zip(y_actual, y_predict):
report = report.update(y_act, y_pred)
report
This is really interesting and super quick too. Isn’t it? I enjoyed exploring its functionalities. I hope you will also like it.