blog

How to Use an LSTM for Timeseries and the Cold-start Problem


by Casey Fitzpatrick

How to Use an LSTM for Timeseries and the Cold-start Problem

Forecasting the energy consumption of a building can play a pivotal role in the operations efficiency of the building. Facility managers, utility companies, and comissioning projects use consumption forecasts to design and implement energy-saving policies that optimize building performance and reduce negative environmental impact.

sustainable

Typically, the best forecasting algortihms use a lot of historical information about the building they're forecasting for––think past consumption trends, past sensor measurements, past building usage––to compute the forecasts. This requirement presents a big challenge: how can we make accurate predictions for new buildings, which don't have a long consumption history?

This is sometimes described as a "cold start" problem, where we need to make accurate predictions off of little to no initial data. Is it possible to create an accurate forecast wihtout much consumption history? That's where you come in!

In our brand new competition, we're asking you to accurately forecast building energy consumption from the beginning of the building instrumentation life. Given only one day to two weeks of consumption history per building, your task to use this "cold start" data to accurately forecast consumption. Some forecasts will ask for hourly consumption predictions, while others will ask for daily or even weekly projections.

In this post, we'll walk through a simple model for predicting energy consumption over different time horizons using varying amounts of "cold start" consumption data. We'll show you how to load the data, train a model to make some predictions, and then submit those predictions to the competition.

To get started, we import some standard data science libraries for loading, manipulating, and visualizing the data.

In [1]:
%matplotlib inline

# plotting
import matplotlib as mpl
mpl.style.use('ggplot')
import matplotlib.pyplot as plt

# math and data manipulation
import numpy as np
import pandas as pd

# to handle paths
from pathlib import Path

# set random seeds 
from numpy.random import seed
from tensorflow import set_random_seed

RANDOM_SEED = 2018
seed(RANDOM_SEED)
set_random_seed(RANDOM_SEED)

Loading the Data

On the data download page, we provide everything you need to get started:

  • consumption_train.csv - Historical series that can be used as training data.
  • cold_start_test.csv - Cold start lead-in periods for the series that appear in the submission format.
  • meta.csv - Information about the series that are available to competitors.
  • submission_format.csv - This is the format for submssions to the competition. Your submissions must match the columns and indices exactly.

Training Data

The main dataset we'll be using in this benchmark is consumption_train.csv. Since this is meant to be a simple first pass, we won't be using the meta features about the buildings, or even the temperature measurements that are provided in consumption_train.csv. It's up to you to take advantage of all the information!

In [2]:
data_path = Path('..', 'data', 'final', 'public')
consumption_train = pd.read_csv(data_path / 'consumption_train.csv', 
                                index_col=0, parse_dates=['timestamp'])
consumption_train.head()
Out[2]:
series_id timestamp consumption temperature
0 103088 2014-12-24 00:00:00 101842.233424 NaN
1 103088 2014-12-24 01:00:00 105878.048906 NaN
2 103088 2014-12-24 02:00:00 91619.105008 NaN
3 103088 2014-12-24 03:00:00 94473.706203 NaN
4 103088 2014-12-24 04:00:00 96976.755526 NaN

Each series_id has an associated time series of consumption and temperature data. For a given series, the measurements are provided at hourly time-steps. In other words, every row of consumption_train represents an hour of data from series_id. Each unique series_id in consumption_train has four weeks, or 28 days, worth of hourly data.

Let's figure out how many series we have, and how much training data that corresponds to.

In [3]:
def describe_training_data(train_df):
    num_training_series = train_df.series_id.nunique()
    num_training_days = num_training_series * 28
    num_training_hours = num_training_days * 24
    assert num_training_hours == train_df.shape[0]
    
    desc = f'There are {num_training_series} training ' \
           f'series totaling {num_training_days} days ' \
           f'({num_training_hours} hours) of consumption data.'
    
    print(desc)
    
describe_training_data(consumption_train)
There are 758 training series totaling 21224 days (509376 hours) of consumption data.

That's a lot of data! In order speed up development and facilitate rapid prototyping, let's reduce the training set. Since all of the training series are the same length, we'll uniformly sample a subset of series. The size of our resulting reduced data will be controlled by frac_series_to_use.

In [4]:
# choose subset of series for training
frac_series_to_use = 0.01

rng = np.random.RandomState(seed=RANDOM_SEED)
series_ids = consumption_train.series_id.unique()
series_mask = rng.binomial(1,
                           frac_series_to_use,
                           size=series_ids.shape).astype(bool)

training_series = series_ids[series_mask]

# reduce training data to series subset
consumption_train = consumption_train[consumption_train.series_id.isin(training_series)]

# describe the reduced set
describe_training_data(consumption_train)
There are 10 training series totaling 280 days (6720 hours) of consumption data.

That will speed things up a bit!

Test Data

The cold start data is provided in cold_start_test.csv. This data is what is provided for each new series before the window that we need to predict. There is a different amount of cold-start data provided for different series, and our model needs to work well whether we are given 1 day or 14.

In [5]:
cold_start_test = pd.read_csv(data_path / 'cold_start_test.csv', 
                              index_col=0, parse_dates=['timestamp'])
cold_start_test.head()
Out[5]:
series_id timestamp consumption temperature
0 102781 2013-02-27 00:00:00 15295.740389 17.00
1 102781 2013-02-27 01:00:00 15163.209562 18.25
2 102781 2013-02-27 02:00:00 15022.264079 18.00
3 102781 2013-02-27 03:00:00 15370.420458 17.00
4 102781 2013-02-27 04:00:00 15303.103213 16.90

The predictions windows are obtained from the submission format prediction_window column.

In [6]:
submission_format = pd.read_csv(data_path / 'submission_format.csv',
                                index_col='pred_id',
                                parse_dates=['timestamp'])
submission_format.head()
Out[6]:
series_id timestamp temperature consumption prediction_window
pred_id
0 102781 2013-03-03 19.931250 0.0 daily
1 102781 2013-03-04 20.034375 0.0 daily
2 102781 2013-03-05 19.189583 0.0 daily
3 102781 2013-03-06 18.397917 0.0 daily
4 102781 2013-03-07 20.762500 0.0 daily

Let's take a closer look at these time series.

Explore the Data

As mentioned above, the task in this challege is to utilize "cold start" consumption data to generate consumption forecasts for various prediction windows. Each series_id in the test set asks for one (and only one) of three prediction windows:

  • hourly - predict consumption each hour for a day (24 predictions)
  • daily - predict consumption each day for a week (7 predictions)
  • weekly - predict consumption each week for two weeks (2 predictions)

The prediction window types are pretty evenly distributed across the series_ids

In [7]:
# confirm that every series asks for only one type of prediction
assert all(1 == submission_format.groupby('series_id').prediction_window.nunique())

# use the first() prediction window value from a series_id so as not to overcount
submission_format.groupby('series_id').prediction_window.first().value_counts()
Out[7]:
hourly    242
weekly    192
daily     191
Name: prediction_window, dtype: int64

For each series_id in the test set, the amount of hourly cold start data you are provided is never less than one day (24 consecutive consumption measurements) and can be as much as two weeks (336 consecutive consumption measurements). Cold start data only comes in full-day chunks. In other words, some series may have 3, 4, ..., 14 days worth of cold start data, but no series have 3.5, 4.1, ..., 13.9, or any other partial-day's worth of cold start data.

Since each timestep represents and hour, we can visually confirm the above claim with a simple bar plot counting the number of days in each test series.

In [8]:
ax = (cold_start_test.groupby('series_id').count()
                                           .timestamp
                                           .divide(24)
                                           .value_counts()
                                           .sort_index()
                                           .plot.bar())
ax.set_xlabel('Days in Cold Start Data')
ax.set_ylabel('Number of Test Series')
plt.show()

Some series may provide little cold start data and ask for a long (weekly) forecast, while others may provide many days of cold start data and ask for a short (hourly) forecast. Let's take a look at how these scenarios are distributed in the test set.

First, for convenience, we'll add prediction_window to the test data.

In [9]:
# add prediction_window to the test data
pred_windows = submission_format[['series_id', 'prediction_window']].drop_duplicates()
cold_start_test = cold_start_test.merge(pred_windows, on='series_id')

num_cold_start_days_provided = (cold_start_test.groupby('series_id')
                                               .prediction_window
                                               .value_counts()
                                               .divide(24))

num_cold_start_days_provided.head()
Out[9]:
series_id  prediction_window
100004     weekly                4.0
100010     hourly               12.0
100012     hourly               10.0
100020     weekly               13.0
100028     daily                13.0
Name: prediction_window, dtype: float64

Now we'll plot the distribution of prediction_window for each unique length of cold start data in the test set.

In [10]:
def _count_cold_start_days(subdf):
    """ Get the number of times a certain cold-start period
        appears in the data.
    """
    return (subdf.series_id
                 .value_counts()
                 .divide(24)  # hours to days 
                 .value_counts())

cold_start_occurrence = (cold_start_test.groupby('prediction_window')
                                        .apply(_count_cold_start_days))

cold_start_occurrence.head()
Out[10]:
prediction_window      
daily              1.0     19
                   8.0     18
                   13.0    18
                   6.0     15
                   9.0     15
Name: series_id, dtype: int64
In [11]:
ax = cold_start_occurrence.unstack(0).plot.bar(figsize=(10, 6),
                                               rot=0)

ax.set_xlabel('Number of Cold-start Days')
ax.set_ylabel('Number of Series in Dataset');

It looks like the various combinations are all represented, with a couple of combinations being more or less represented. However, there does not seem to be any advantage to building this into our algorithm.

Of course, you aren't only provided with cold start data. The main training set, consumption_train has many series, each with 4 weeks worth of hourly data. To gain a sense of what the cold start prediction asks of you, let's look at a few examples of how the cold start data leads up to the prediction window.

In [12]:
# visualize the cold start and prediction windows
num_to_plot = 40

fig, ax = plt.subplots(figsize=(45, 15))

cold_start_map = {'hourly': 1, 'daily': 7, 'weekly': 14}

rng = np.random.RandomState(seed=5425)
series_to_plot = rng.choice(cold_start_test.series_id.unique(), num_to_plot)

for i, series_id in enumerate(series_to_plot):
    # get relevant information about the test series from coldstart data
    series_data = cold_start_test[cold_start_test.series_id == series_id]
    start_cold = series_data.timestamp.min()
    start_preds = series_data.timestamp.max()
    
    # get prediction stop from submission format
    stop_preds = submission_format[submission_format.series_id == series_id].timestamp.max()
    
    # plot the cold start and prediction window relative sizes
    ax.plot([start_cold, start_preds], [i, i], c='#6496e5', linewidth=num_to_plot / 3)
    ax.plot([start_preds, stop_preds], [i, i], c='#e25522', linewidth=num_to_plot / 3)

# the y tick labels don't mean anything so turn them off
ax.set_yticks([])
plt.tick_params(labelsize=35)
plt.legend(['Cold Start (1-14 Days)', 
            'Prediction Window'], 
           fontsize=30)

plt.title(f'{num_to_plot} Series Cold Starts And Their Prediction Windows', fontsize=50);

Hopefully this gives you a sense of how you can use the training data to simulate all of the possible cold start scenarios you'll enounter in the test set. For example, at one extreme the 4 weeks can be split into 2 weeks of simulated cold start, followed by two weeks worth of hourly consumption that you can use to create a validation set (by summing the consumption for each week, leading to 2 weekly predictions). At the other extreme, you can take the first day of consumption data from a 4 week long series as cold start and forecast the next 2 weeks in the same way.

Using the series consumption_train, all cold start windows from 1 to 14 days can be created, with a minimum of 14 more days that can be used to create validation sets.

Finally, let's take a look at the consumption patterns in a few series to see that they're 4 weeks.

In [13]:
# plot a few consumption patterns
series_to_plot = rng.choice(consumption_train.series_id.unique(), 3)

for ser_id in series_to_plot:
    ser_data = consumption_train[consumption_train.series_id == ser_id]
    ax = ser_data.plot(x='timestamp',
                       y='consumption', 
                       title=f"series_id {int(ser_id)}",
                       legend=False)
    plt.ylabel('consumption (watt-hours)')
    plt.show()

We can see that all of the series in the training data (even though we just selected a few at the top of this notebook) have the same length.

In [14]:
(consumption_train.groupby('series_id')
                  .timestamp
                  .apply(lambda x: x.max() - x.min())
                  .value_counts())
Out[14]:
27 days 23:00:00    10
Name: timestamp, dtype: int64

The Error Metric - Normalized Mean Absolute Error

The performance metric is a normalized version of mean absolute error. The normalization coefficient $c_{i}$ for the $i^{\text{th}}$ prediction is composed of a ratio of two numbers,

$$ c_i = \frac{w_i}{m_i} $$

where $w_i$ is a weight that makes weekly (24 / 2), daily (24 / 7), and hourly (24 / 24) predictions equally important, and $m_i$ is the true mean consumption over the prediction window under consideration (this mean is unknown to competitors). Multiplying predictions by this coefficient makes each prediction equally important and puts hourly, daily, and weekly predictions on the same scale.

Given the above definition, the metric $NMAE$ is

$$ NMAE = \frac{1}{N} \sum_{i=1}^{N}|\hat{y_i} - y_i|c_i $$
  • $N$ - The total number of consumption predictions submitted, including all hourly, daily, and weekly predictions
  • $\hat{y}$ - The predicted consumption value
  • $y$ - The actual consumption value
  • $c$ - The normalization coefficient that weights and scales each prediction to have the same impact on the metric

Note The coefficients $c$ are not available to competitors for the test set because the true mean consumptions for the $i^{\text{th}}$ prediction window, $m_i$ are not available to competitors. For the training set, competitors can calculate $m_i$ when creating their own validation sets.

Build the Model

This challenge asks you to take variable length time series as inputs (the cold start data) and map them to variable length time series as outputs (the prediction window). Putting our jargon hats on, we can call this a variable length input variable length output sequence to sequence time series regression problem. While there are many approaches to time series problems, we're going to use a straightforward strategy that relies on a small Long Short Term Memory (LSTM) neural network.

Long Short Term Memory (LSTM) Network

Long Short Term Memory (LSTM) networks can learn patterns over long sequences and do not neccessarily rely on a pre-specified window of lagged observation as input. Further, their internal states can be built up to include information about cold start data before making any number of forecast predictions we ask for. This means that a single LSTM model can handle all prediction horizons in the data. Our overall approach will be to train the network on 4 week series, then use the cold start data to prime the netowrk's internal state for forecasting, and then forecast.


lstm
(Above) LSTM in unrolled form, showing how data moves through the graph. The inputs $x$ are processed by interacting networks inside of the cell $A$ yeidling a "hidden state" value $h$ which moves through to the next time step where the process is repeated. The hidden state is updated––either "forgetting" old information or "storing" new information––through linear interactions with the sub-cellular networks as the next time step is processed. If you'd like more detail, this image is from a useful blog post on the model. If you're new to LSTMs, fear not! Keras does most of the work for us.


Below we'll go through what's needed to set up a one-step forecast model using the Keras LSTM layer. First, let's consider how to process the data below.

Preparing the data

Appropriately preprocessing data is always important. To prepare our data for modeling with an LSTM let's consider a few standard approaches.

Scaling

Due to their dependence on backpropogation, LSTM networks behave best when the data is within the scale of the activation function used (tanh by default in LSTM, domain $(-1, 1)$). Scikit Learn's MinMaxScaler is sufficient for this task, since it allows us to specify a range and invert transformations.

Stationary

Often in timeseries problems, you want to ensure that your data is stationary, and make it so if it isn't (for example using differencing). That is beyond the scope of this benchmark!

Converting Timeseries to Supervised

We'll use a standard lagged variable approach to convert the timeseries problem to a supervised learning problem, where given features $X$ and labels $y$ the model learns a mapping between them. In this case, the features are our lagged series and the label is the orginal series. This process can be used with any model, but the advantage here is that we can use a stateful LSTM that learns dependencies between each timestep in the data.

Creating Lagged Features for Timeseries

As mentioned above, we'll want to convert each series into a matrix of lagged variables. The Pandas shift method on DataFrames makes this easy.

In [15]:
def create_lagged_features(df, lag=1):
    if not type(df) == pd.DataFrame:
        df = pd.DataFrame(df, columns=['consumption'])
    
    def _rename_lag(ser, j):
        ser.name = ser.name + f'_{j}'
        return ser
        
    # add a column lagged by `i` steps
    for i in range(1, lag + 1):
        df = df.join(df.consumption.shift(i).pipe(_rename_lag, i))

    df.dropna(inplace=True)
    return df

# example series
test_series = consumption_train[consumption_train.series_id == 100283]
create_lagged_features(test_series.consumption, lag=3).head()
Out[15]:
consumption consumption_1 consumption_2 consumption_3
7395 562464.244579 570699.933083 561278.342934 527862.165731
7396 565464.153881 562464.244579 570699.933083 561278.342934
7397 593757.048480 565464.153881 562464.244579 570699.933083
7398 606651.971117 593757.048480 565464.153881 562464.244579
7399 570132.762731 606651.971117 593757.048480 565464.153881

Function to Prepare Training Data for Model

Before each series is processed, we'll need to convert it to a supervised learning problem, then scale the data. Let's create a function, prepare_training_data that we can call on each sample in our training loop.

In [16]:
from sklearn.preprocessing import MinMaxScaler

def prepare_training_data(consumption_series, lag):
    """ Converts a series of consumption data into a
        lagged, scaled sample.
    """
    # scale training data
    scaler = MinMaxScaler(feature_range=(-1, 1))
    consumption_vals = scaler.fit_transform(consumption_series.values.reshape(-1, 1))
    
    # convert consumption series to lagged features
    consumption_lagged = create_lagged_features(consumption_vals, lag=lag)

    # X, y format taking the first column (original time series) to be the y
    X = consumption_lagged.drop('consumption', axis=1).values
    y = consumption_lagged.consumption.values
    
    # keras expects 3 dimensional X
    X = X.reshape(X.shape[0], 1, X.shape[1])
    
    return X, y, scaler

_X, _y, scaler = prepare_training_data(test_series.consumption, 5)
print(_X.shape)
print(_y.shape)
print(scaler)
(667, 1, 5)
(667,)
MinMaxScaler(copy=True, feature_range=(-1, 1))

Make Model

Now we're ready to build the model. A single LSTM cell followed by a dense layer for regression is a simple base model that we can start with. Let's import Keras objects with a Tensorflow backend. (If you're new to Tensorflow and Keras, see installation instructions here).

In [17]:
# modeling
import keras
from keras.models import Sequential
from keras.layers import LSTM, Dense

# progress bar
from tqdm import tqdm
Using TensorFlow backend.

As usual with Keras, getting from imports to a simple custom model be done in just a few lines. Next we'll instantiate a Keras Sequential configuration and add to it a single recurrent cell followed by a dense layer to output regressed values. We'll also keep the number of neurons in the LSTM cell (this is just the dimensionality of the LSTM output) and the number of passes through the data small.

Note the importance of setting stateful=True below. This allows the LSTM to learn the patterns within a batch (the lagged series data). Otherwise the state would be reset after each time step, but we want the model to develop states that depend on every timestep we show it.

In [18]:
# lag of 24 to simulate smallest cold start window. Our series
# will be converted to a num_timesteps x lag size matrix
lag =  24

# model parameters
num_neurons = 24
batch_size = 1  # this forces the lstm to step through each time-step one at a time
batch_input_shape=(batch_size, 1, lag)

# instantiate a sequential model
model = Sequential()

# add LSTM layer - stateful MUST be true here in 
# order to learn the patterns within a series
model.add(LSTM(units=num_neurons, 
              batch_input_shape=batch_input_shape, 
              stateful=True))

# followed by a dense layer with a single output for regression
model.add(Dense(1))

# compile
model.compile(loss='mean_absolute_error', optimizer='adam')

Fit Model

Now for the fun part! Above we showed that the series in consumption_train could be used to simulate all sorts of cold start types. Here we're just going to fit each 4 week series using a 24 hour lag, so that ideally the model learns at least patterns over a single day.

To train, we pass through the consumption data, grouped by series_id. For each series, we first prepare_training_data, then fit using that data, resetting the state only after each row of the series has been seen. In this way, each lagged timeseries is learned by the network before the state is reset and the next series is passed in.

To keep this example quick, we're only going to make a single pass through the data. An easy way to improve this benchmark is to simply change num_passes_through_data below (to a much larger number).

In [19]:
%%time
num_training_series = consumption_train.series_id.nunique()
num_passes_through_data = 3

for i in tqdm(range(num_passes_through_data), 
              total=num_passes_through_data, 
              desc='Learning Consumption Trends - Epoch'):
    
    # reset the LSTM state for training on each series
    for ser_id, ser_data in consumption_train.groupby('series_id'):

        # prepare the data
        X, y, scaler = prepare_training_data(ser_data.consumption, lag)

        # fit the model: note that we don't shuffle batches (it would ruin the sequence)
        # and that we reset states only after an entire X has been fit, instead of after
        # each (size 1) batch, as is the case when stateful=False
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
        model.reset_states()
Learning Consumption Trends - Epoch: 100%|██████████| 3/3 [00:38<00:00, 12.92s/it]
CPU times: user 1min 6s, sys: 14.3 s, total: 1min 20s
Wall time: 38.8 s

Time to Predict and Submit

To generate our submission, we'll loop over submission format grouping by series_id as above. Within each iteration, we'll need to

  • prepare the cold start data
  • generate the forecast

To generate the forecast, we'll

  • build up the internal state using the prepared cold start data
  • predict all hours in the prediction window
  • aggregate the forecast appropriately for the submission

To simplify the prediction loop, let's write a couple of functions to accomplish the above tasks.

Function to Generate Forecast

The forecast will start by using the cold start data to build up the state. Then the final cold start prediction will be used to initialize the rest of the forecast. the output of the previous prediction will be used as input to the next predcition. Finally, we just need to invert the scaling at the end to get back to normal consumption values.

In [20]:
def generate_hourly_forecast(num_pred_hours, consumption, model, scaler, lag):
    """ Uses last hour's prediction to generate next for num_pred_hours, 
        initialized by most recent cold start prediction. Inverts scale of 
        predictions before return.
    """
    # allocate prediction frame
    preds_scaled = np.zeros(num_pred_hours)
    
    # initial X is last lag values from the cold start
    X = scaler.transform(consumption.values.reshape(-1, 1))[-lag:]
    
    # forecast
    for i in range(num_pred_hours):
        # predict scaled value for next time step
        yhat = model.predict(X.reshape(1, 1, lag), batch_size=1)[0][0]
        preds_scaled[i] = yhat
        
        # update X to be latest data plus prediction
        X = pd.Series(X.ravel()).shift(-1).fillna(yhat).values

    # revert scale back to original range
    hourly_preds = scaler.inverse_transform(preds_scaled.reshape(-1, 1)).ravel()
    return hourly_preds

Before we start forecasting, let's make a copy of the submission format for our submission.

In [21]:
# copy submission format and fill in values
my_submission = submission_format.copy()

Make Predictions: Forecast the Cold Start Data

Our forecast essentially uses the two above functions. The only other step is that some predictions need to be aggregated, since our forecasts all start as hourly. The dictionary pred_window_to_num_preds specifies the aggregation, then Numpy's split function can be used to aggregate the right amount.

Note: Because this basic approach generates a forecast for every hour of the prediction window, it takes a bit of time (10-15 minutes) to generate all predictions.

In [22]:
%%time
pred_window_to_num_preds = {'hourly': 24, 'daily': 7, 'weekly': 2}
pred_window_to_num_pred_hours = {'hourly': 24, 'daily': 7 * 24, 'weekly': 2 * 7 * 24}

num_test_series = my_submission.series_id.nunique()

model.reset_states()

for ser_id, pred_df in tqdm(my_submission.groupby('series_id'), 
                            total=num_test_series, 
                            desc="Forecasting from Cold Start Data"):
        
    # get info about this series' prediction window
    pred_window = pred_df.prediction_window.unique()[0]
    num_preds = pred_window_to_num_preds[pred_window]
    num_pred_hours = pred_window_to_num_pred_hours[pred_window]
    
    # prepare cold start data
    series_data = cold_start_test[cold_start_test.series_id == ser_id].consumption
    cold_X, cold_y, scaler = prepare_training_data(series_data, lag)
    
    # fine tune our lstm model to this site using cold start data    
    model.fit(cold_X, cold_y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
    
    # make hourly forecasts for duration of pred window
    preds = generate_hourly_forecast(num_pred_hours, series_data, model, scaler, lag)
    
    # reduce by taking sum over each sub window in pred window
    reduced_preds = [pred.sum() for pred in np.split(preds, num_preds)]
    
    # store result in submission DataFrame
    ser_id_mask = my_submission.series_id == ser_id
    my_submission.loc[ser_id_mask, 'consumption'] = reduced_preds
Forecasting from Cold Start Data: 100%|██████████| 625/625 [05:20<00:00,  1.95it/s]
CPU times: user 8min 5s, sys: 1min 35s, total: 9min 41s
Wall time: 5min 20s

Check Scale of Prediction

As a quick check, we expect the mean hourly prediction to be smaller than the daily prediction, which is in turn smaller than the weekly prediction.

In [23]:
my_submission[my_submission.prediction_window == 'hourly'].consumption.describe()
Out[23]:
count    5.808000e+03
mean     1.665625e+05
std      6.166709e+05
min     -5.872563e+03
25%      1.606849e+04
50%      3.669760e+04
75%      9.310056e+04
max      5.200182e+06
Name: consumption, dtype: float64
In [24]:
my_submission[my_submission.prediction_window == 'daily'].consumption.describe()
Out[24]:
count    1.337000e+03
mean     3.351025e+06
std      9.992965e+06
min     -3.386855e+04
25%      3.716496e+05
50%      8.894688e+05
75%      2.700579e+06
max      1.166669e+08
Name: consumption, dtype: float64
In [25]:
my_submission[my_submission.prediction_window == 'weekly'].consumption.describe()
Out[25]:
count    3.840000e+02
mean     2.308341e+07
std      6.446955e+07
min     -1.002589e+06
25%      2.597437e+06
50%      7.459545e+06
75%      1.883285e+07
max      7.391700e+08
Name: consumption, dtype: float64

Did the network learn to forecast reasonable looking patterns?

In [26]:
# plot samples
sample_ser = (my_submission[my_submission.prediction_window == 'hourly']
                           .series_id
                           .sample().values[0])

(my_submission[my_submission.series_id == sample_ser]
              .plot(x='timestamp',
                    y='consumption',
                    title=sample_ser,
                    rot=90))
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1222a6828>

Save Submission

In [27]:
my_submission.head(5)
Out[27]:
series_id timestamp temperature consumption prediction_window
pred_id
0 102781 2013-03-03 19.931250 369780.532281 daily
1 102781 2013-03-04 20.034375 371794.340190 daily
2 102781 2013-03-05 19.189583 371649.627674 daily
3 102781 2013-03-06 18.397917 371653.966432 daily
4 102781 2013-03-07 20.762500 371655.221417 daily
In [28]:
save_path = Path('..', 'data', 'submissions')
save_path.mkdir(exist_ok=True, parents=True)

my_submission.to_csv(save_path / "my_submmission.csv", index_label='pred_id')
In [29]:
!head ../data/submissions/my_submmission.csv
pred_id,series_id,timestamp,temperature,consumption,prediction_window
0,102781,2013-03-03 00:00:00,19.93125,369780.5322810872,daily
1,102781,2013-03-04 00:00:00,20.034375,371794.3401899857,daily
2,102781,2013-03-05 00:00:00,19.18958333333333,371649.62767406576,daily
3,102781,2013-03-06 00:00:00,18.397916666666667,371653.9664321951,daily
4,102781,2013-03-07 00:00:00,20.7625,371655.2214171451,daily
5,102781,2013-03-08 00:00:00,19.8,371654.82210375194,daily
6,102781,2013-03-09 00:00:00,20.466666666666665,371654.9294821434,daily
7,103342,2013-06-26 00:00:00,10.486607142857146,10889260.963609295,weekly
8,103342,2013-07-03 00:00:00,10.006547619047616,10052533.084825085,weekly

Submit to the Leaderboard

Woohoo! It's a start! And that's exactly what we intend with these benchmarks. We're sure you'll be able to top this model in no time, and we can't wait to see what you come up with. Happy importing!