Combining PyCaret and TimeMachines for Time-Series Prediction

The popular open-source PyCaret package provides automated machine learning capability, allowing the user to search hundreds of regression models. The TimeMachines package provides a variety of incremental (online) time-series algorithms. In this short post, we cover the nuts and bolts of using these two libraries together. We shall

  1. Grab a live time series from microprediction
  2. Fit with pycaret
  3. Run some timemachines models
  4. Fit with pycaret again

All code is provided in https://github.com/microprediction/timeseries-notebooks/blob/main/pycaret_microprediction_timemachines.ipynb. Let’s get a few preliminaries out of the way.

!pip install pycaret[full]
!pip install --upgrade statsmodels
!pip install microprediction
!pip install timemachines
import microprediction
from datetime import datetime, timedelta
from microprediction import MicroReader
import random
import matplotlib.pyplot as plt
import pandas as pd

If you get stuck here, I recommend

!pip install --upgrade pip

Retrieving live time-series data from microprediction.org

Lest you are not aware, microprediction hosts “live” contributed time-series. You can browse the listing and, should you wish, enter live time-series competitions. On the other hand, pycaret is intended for “offline” analysis of time-series. Here I’m merely providing a walk-through of offline analysis, with the caveat that deployment in real-time has its own challenges.

There are just a couple of things to be aware of when retrieving data from microprediction using the MicroReader

  • Time series are live, so each time you run this the data will be different
  • Time series are returned as lagged values, so you need to reverse them for chronological ordering
  • Time is measured in epoch seconds at microprediction

There is a video tutorial on retrieving the historical data should you need it but it is, I hope, fairly straightforward and it requires no credentials. The imports…

import microprediction
from datetime import datetime, timedelta
from microprediction import MicroReader
import random
import matplotlib.pyplot as plt
import pandas as pd

Now grab the data and take a peek

mr = MicroReader()
all_streams = mr.get_stream_names()
lagged_values = []
while len(lagged_values) < 900:
a_stream = random.choice(all_streams)
lagged_values, lagged_seconds = mr.get_lagged_values_and_times(a_stream)
values = list(reversed(lagged_values))
dt = [ datetime.fromtimestamp(s) for s in reversed(lagged_seconds)]
plt.plot(dt,values)
plt.title(a_stream)

Here’s an example:

Using PyCaret

There’s a new time-series module coming to pycaret soon, but for now let’s lean on the regression capabilities. Let’s massage the data into a PyCaret-friendly format, illustrating how you might create features as you go.

df = pd.DataFrame(columns=['Date','y'])
df['date'] = dt
df['y']=values
df['dayofweek'] = df['date'].dt.dayofweek
df['hour'] = df['date'].dt.hour
num_lags = 10
lags = range(1,num_lags)
lag_names = [ 'y_'+str(lag) for lag in lags ]
for lag, lag_name in zip(lags,lag_names):
df[lag_name] = df['y'].shift(lag)
numerical_features = lag_names
categorical_features = ['dayofweek','hour']

Then due to the convenience of PyCaret, there isn’t much more to do.

s = setup(df, target = 'y', train_size = 0.95,
data_split_shuffle = False, fold_strategy = 'timeseries', fold = 3,
ignore_features = ['date'],
numeric_features = numerical_features,
categorical_features = categorical_features,
silent = True, verbose = False, session_id = 123)
top5 = compare_models(n_select = 5)

You should see a model leaderboard like the following

Using TimeMachines models as features

Next, we will run a few univariate time-series models to generate features. We import them and use the “prior” method to generate one-step ahead predictions. These are added as new columns.

from timemachines.skaters.allskaters import EMA_SKATERS, DLM_SKATERS, THINKING_SKATERS, TSA_SKATERS, HYPOCRATIC_ENSEMBLE_SKATERS
from timemachines.skating import prior
skaters = EMA_SKATERS + DLM_SKATERS + TSA_SKATERS + HYPOCRATIC_ENSEMBLE_SKATERS
skater_names = [ f.__name__ for f in skaters ]
for f, skater_name in zip(skaters,skater_names):
print('Running '+skater_name)
y = df['y'].values
x,x_std = prior(f, y=y, k=1) # Runs a time-series model forward
df[skater_name] = x

This step can take a while. That’s why the timemachines library exists — to provides fast incremental time-series models (in addition to exposing others whose speed is what it is). If you want it to be even more painful, try including the most popular time-series packages released by major companies :)

But now we can re-run pycaret as before, with new features added.

s = setup(df, target = 'y', train_size = 0.95,
data_split_shuffle = False, fold_strategy = 'timeseries', fold = 3,
ignore_features = ['date'],
numeric_features = numerical_features + skater_names,
categorical_features = categorical_features,
silent = True, verbose = False, session_id = 123)
top5again = compare_models(n_select = 5)

And that’s about all there is to it. If your example is like mine, you’ll see a small reduction in the reported errors. Of course, that will depend on the time-series you happened to choose. And as they say, past performance is no guarantee of future returns.

--

--

--

Chief Data Scientist, Intech Investments

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

ACEA Smart Water Analytics Competition; Introduction

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks | Research Paper Walkthrough

Fundamental vs Technical Analysis

Intro to BigQuery and its Free Data Sets

BabelNet — A Next Generation Dictionary & Language Research Tool

Uncover hidden gems with Exploratory Data Analysis

Let’s build a simple distributed computing system, for modern cloud

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Microprediction

Microprediction

Chief Data Scientist, Intech Investments

More from Medium

An easy start into Time-series Forecasting: A practical example using Darts library

Trend/no-Trend in Time-series

Flexible time series forecasting using machine learning

Forecasting with ETNA: Fast and Furious