Detection of Structural Change in Dynamic Linear Models:

The pybats-detection package

André F. B. Menezes
Maynooth University

Joint work with Eduardo G. Pinheiro and Hélio S. Migon

October 05, 2022
Maynooth, TSE seminar

Today’s plan

  1. Introduction
  2. Bayesian Dynamic Linear Models
  3. Automatic Monitoring
  4. The pybats-detection package
  5. Example

Take-away message

  • pybats-detection is a Python package developed for Bayesian monitoring and intervention in univariate time series data.

Introduction

Background

What is a structural change?

  • When the time series abruptly changes at a point in time.

  • This could happen because of changes in the location, scale or other parameters of the process.

  • In such cases the model should be modified and adapted.

Structural change in ecological data

  • Analyses of structural changes in ecological time series (ASCETS) by Östman et al. (2020)

  • Study of ecological indicators over time such as:

    • species abundances;
    • reproduction success;
    • species density or biomass
  • Detecting structural change in ecological time series and adapting is fundamental to improve future forecasts as well as for inference purposes.

Bayesian Dynamic Linear Model

Overview

  • Dynamic Linear Models (DLMs) are a natural extension of linear (regression) models.

  • Go further to the well-known Kalman filter.

  • Based on Bayes’ theorem.

  • Takes into account the sequential nature of the data.

Definition

  • A DLM is defined by the following observation and evolution equations:

\[\begin{eqnarray} Y_t &=& \mathbf{F}_t^\top\,\boldsymbol{\theta}_t + \nu_t, \qquad \nu_t \sim N[0, V_t] \label{eq:dlm_observation} \\ \label{eq:dlm_state_evolution} \boldsymbol{\theta}_t &=& \mathbf{G}_t\,\boldsymbol{\theta}_{t-1} + \boldsymbol{\omega}_t, \quad \boldsymbol{\omega}_t \sim N[\mathbf{0}, \mathbf{W}_t] \end{eqnarray}\]

  • \(\boldsymbol{\theta}_t\) is the \(p \times 1\) state vector of model coefficients at time \(t\).
  • \(\mathbf{F}_t\) is a \(p\)-dimensional vector of known constant or regressors at time \(t\).

Definition

  • \(\mathbf{G}_t\) is a known \(p \times p\) matrix, referred as state evolution matrix at time \(t\).

  • \(V_t\) is the observation variance.

  • \(\mathbf{W}_t\) is the state evolution variance-covariance matrix.

  • The quadruple \(\{ \mathbf{F}, \mathbf{G}, V, \mathbf{W}\}_t\) describes the class of DLMs.

A particular DLM

  • A very useful model in the class of DLM is the linear growth or second-order polynomial model.

  • Let be \(\mathbf{F}_t = (1, 0)^ \top\) and \(\mathbf{G}_t = \left( \begin{matrix} 1 &1 \\ 0 & 1 \end{matrix}\right)\) we obtain the linear growth model: \[\begin{eqnarray} y_t &=& \mu_t + \nu_t \\ \mu_t &=& \mu_{t-1} + \beta_t + \delta\mu_t \\ \beta_t &=& \beta_{t-1} + \delta\beta_t \end{eqnarray}\]

  • Suitable for medium-term forecasts in time series with trend patterns.

Inference Aspects

  • The two main stages involved in the sequential inference are: (i) evolution and (ii) updating

Automatic Monitoring

Steps

  1. Define the local Bayes factor, \({B_t(k)}\);

  2. Create an alternative DLM (\(M_1\)) describing a level or scale shift; and

  3. Define the subjective and automatic interventions.

  • The method assesses the model’s performance based on purely statistical measures.

Local Bayes factor

The local Bayes factor is defined as

\[B_t(M_0, M_1) = \dfrac{p(y_t \mid M_0)}{p(y_t \mid M_1)}\] where \(p(y_t \mid M_i)\) is the predictive density distribution and \(M_0\) and \(M_1\) denote the current and alternative models, respectively.

Local Bayes factor

At time \(t\), the monitor will be based on the most recent \(k\) observations, which are defined as: \(B_t(k)=\frac{p(y_t,\ldots, y_{t-k+1} \mid M_0)}{p(y_t,\ldots, y_{t-k+1} \mid M_1)}\). Two other important quantities are cumulative Bayes factor and the run-length: \[\begin{eqnarray*} L_t&=& B_t \, \min_{1 \le k \le t}\{1, L_{t-1}\}\\ l_t&=& 1+ l_{t-1}\,\times \, I_{(-\infty,1)}(L_{t-1}) \end{eqnarray*}\] where \(B_t=B_t(0)\).

Standardised forecasts errors

Evaluating the model consistency of \(y_t\) is equivalent to evaluating for the one-step-ahead standardized forecasting error (\(e_t\)) distribution which is \[ e_t \sim N[0, 1] \] and so \[ p(e_t \mid M_0) \propto \exp\left\{-0.5 e^2_t \right\} \]

Alternative model for location

A natural alternative model, \(M_1\), assumed that \(e_t\) has a non-zero mean \(h\), which has p.d.f. \[ p(e_t \mid M_1) \propto \exp\left\{-0.5 (e_t - h)^2\right\} \]

Thus, the local Bayes factor at time \(t\) is given by: \[ B_t = \exp\left\{-0.5 (h^2 - 2 h e_t)\right\} \]

Alternative model for location

The sensitivity of the monitor depends on:

  1. A threshold \(\tau\) below which the predictive error must be incompatible with the current DLM, \(M_0\);
  2. The shift mean value, \(h\) for the alternative model, \(M_1\);
  3. \(l_{max}\), the maximum number of terms to be considered in \(B_t(k)\).
  • Recommended values are \(l_{max} = 3, h = 4\) and \(\tau = 0.135\), since these values lead to indifference between \(M_0\) and \(M_1\) when \(e_t\) is near to \(2.0\).

Alternative model for location

Automatic adaptation

  • After the detection of an onset of change or outliers adaptation by increasing parameter uncertainties is incorporated in the model.

pybats-detection

Package Overview

  • pybats-detection is built under Object Oriented Programming using Python classes

  • The three main classes of pybats-detection are: Smoothing, Intervention, and Monitoring.

  • The classes depends on the pybats.dglm.dlm object from the PyBATS package (Lavine and Cron, 2021). A DLM can be initialized in PyBATS as follows:

> from pybats.dglm import dlm
> dlm_obj = dlm(a0=None, R0=None, nregn=0, ntrend=0, 
>               seasPeriods=[], seasHarmComponents=[], 
>               deltrend=1, delregn=1, delseas=1,
>               **kwargs)

The Monitoring class

An instance of the Monitoring class can be initialized as follows:

> from pybats_detection.monitor import Monitoring
> monitoring_learning = Monitoring(
>     mod: pybats.dglm.dlm, prior_length: int = 10, 
>     bilateral: bool = False, smooth: bool = True, 
>     interval: bool = True, level: float = 0.05)

The fit method of Monitoring performs the automatic monitoring.

> monitoring_res = monitoring_learning.fit(
>      y: pandas.Series, X: pandas.DataFrame = None,
>      h: int = 4, tau: float = 0.135,
>      discount_factors: dict = {
>         "trend": 0.10, "seasonal": 0.90, "reg": 0.98},
>      distr: str = "normal", type: str = "location", 
>      verbose: bool = True)

Vignettes

  • Further details can be found in package vignettes available at:

Example

Simulated data

  • The data is simulated from a Normal generating process \(N[\mu, \sigma^2]\) with \(\sigma^2 = 0.5^2\)

  • And \(\mu = (100, 104, 98)\) for the first \(40\) observations, the following \(30\), and the last \(30\), respectively.

The model

A level DLM is used to described the data. The model is given by: \[\begin{eqnarray} Y_t &=& \mu_t + \nu_t, \qquad \nu_t \sim N[0, V_t] \label{eq:level_dlm} \\ \nonumber \mu_t &=& \mu_{t-1} + \omega_t, \quad \omega_t \sim N[0, W_t] \end{eqnarray}\] where the evolution \(\omega_t\) represents the stochastic change in level between times \(t - 1\) and \(t\).

  • This model has \(\mathbf{F}_t = \mathbf{G}_t = 1\).

Using Monitoring class

Code
> a = np.array([100])
> R = np.eye(1)
> R[[0]] = 100
> mod = dlm(a, R, ntrend=1, deltrend=0.90)
> monitor = Monitoring(mod=mod)
> monitor_results = monitor.fit(
>     y=data_location["y"], h=4, tau=0.135,
>     discount_factors={"trend": [0.10]},
>     distr_type="location", bilateral=True, prior_length=4)
Upper potential outlier detected at time 41 with H=2.581e-16, L=2.581e-16 and l=1
Upper potential outlier detected at time 42 with H=2.477e-13, L=2.477e-13 and l=1
Upper potential outlier detected at time 43 with H=6.881e-05, L=6.881e-05 and l=1
Lower potential outlier detected at time 71 with H=1.438e-26, L=1.438e-26 and l=1
Lower potential outlier detected at time 72 with H=2.959e-21, L=2.959e-21 and l=1
Lower potential outlier detected at time 73 with H=7.951e-08, L=7.951e-08 and l=1

Graphical comparison

One-step-ahead forecasts with 95% credible interval.

Concluding remarks

Concluding remarks

  • pybats-detection provides methods for identifying structural changes and outliers in Bayesian Dynamic Linear Models.

  • The methodology is scalable and efficient for analysing real time series data.

  • To extend the intervention and monitoring for other class of models, such as transfer functions, DGLM and DGEGM.

References

  • Harrison, P.J., Stevens, C.F., 1976. Bayesian forecasting. Journal of the Royal Statistical Society. Series B (Methodological) 38, 205–247. doi:https:// doi.org/10.1111/j.2517-6161.1976.tb01586.x.

  • Lavine, I., Cron, A., 2021. PyBATS: Bayesian time series modeling and fore- casting. URL: https://lavinei.github.io/pybats/. python package version 0.0.5.

  • West, M., Harrison, J., 1989. Subjective intervention in formal models. Journal of Forecasting 8, 33–53. doi:10.1002/for.3980080104.

  • West, M., Harrison, P.J., 1986. Monitoring and adaptation in Bayesian fore- casting models. Journal of the American Statistical Association 81, 741–750. doi:10.1080/01621459.1986.10478331.

  • West, M., Harrison, P.J., 1997. Bayesian Forecasting and Dynamic Models. 2nd ed., Springer, New York.

Thank you!