Combining PyCaret and TimeMachines for Time-Series Prediction

The popular open-source PyCaret package provides automated machine learning capability, allowing the user to search hundreds of regression models. The TimeMachines package provides a variety of incremental (online) time-series algorithms. In this short post, we cover the nuts and bolts of using these two libraries together. We shall

  1. Grab a live time series from microprediction
  2. Fit with pycaret
  3. Run some timemachines models
  4. Fit with pycaret again

All code is provided in https://github.com/microprediction/timeseries-notebooks/blob/main/pycaret_microprediction_timemachines.ipynb. Let’s get a few preliminaries out of the way.

If you get stuck here, I recommend

Retrieving live time-series data from microprediction.org

Lest you are not aware, microprediction hosts “live” contributed time-series. You can browse the listing and, should you wish, enter live time-series competitions. On the other hand, pycaret is intended for “offline” analysis of time-series. Here I’m merely providing a walk-through of offline analysis, with the caveat that deployment in real-time has its own challenges.

There are just a couple of things to be aware of when retrieving data from microprediction using the MicroReader

  • Time series are live, so each time you run this the data will be different
  • Time series are returned as lagged values, so you need to reverse them for chronological ordering
  • Time is measured in epoch seconds at microprediction

There is a video tutorial on retrieving the historical data should you need it but it is, I hope, fairly straightforward and it requires no credentials. The imports…

Now grab the data and take a peek

Here’s an example:

Using PyCaret

There’s a new time-series module coming to pycaret soon, but for now let’s lean on the regression capabilities. Let’s massage the data into a PyCaret-friendly format, illustrating how you might create features as you go.

Then due to the convenience of PyCaret, there isn’t much more to do.

You should see a model leaderboard like the following

Using TimeMachines models as features

Next, we will run a few univariate time-series models to generate features. We import them and use the “prior” method to generate one-step ahead predictions. These are added as new columns.

This step can take a while. That’s why the timemachines library exists — to provides fast incremental time-series models (in addition to exposing others whose speed is what it is). If you want it to be even more painful, try including the most popular time-series packages released by major companies :)

But now we can re-run pycaret as before, with new features added.

And that’s about all there is to it. If your example is like mine, you’ll see a small reduction in the reported errors. Of course, that will depend on the time-series you happened to choose. And as they say, past performance is no guarantee of future returns.

Chief Data Scientist, Intech Investments