Interrupted Time Series (ITS) with scikit-learn models#

This notebook shows an example of using interrupted time series, where we do not have untreated control units of a similar nature to the treated unit and we just have a single time series of observations and the predictor variables are simply time and month.

import pandas as pd
from sklearn.linear_model import LinearRegression

import causalpy as cp

%config InlineBackend.figure_format = 'retina'

Load data#

df = (
    cp.load_data("its")
    .assign(date=lambda x: pd.to_datetime(x["date"]))
    .set_index("date")
)

treatment_time = pd.to_datetime("2017-01-01")
df.head()

	month	year	t	y
date
2010-01-31	1	2010	0	25.058186
2010-02-28	2	2010	1	27.189812
2010-03-31	3	2010	2	26.487551
2010-04-30	4	2010	3	31.241716
2010-05-31	5	2010	4	40.753973

Run the analysis#

result = cp.InterruptedTimeSeries(
    df,
    treatment_time,
    formula="y ~ 1 + t + C(month)",
    model=LinearRegression(),
)

Examine the results#

fig, ax = result.plot()

../_images/217f704bd93c87799e6e4a03bb8c3c283c6a6874787b5aed83a3b5fbf2350db3.png

result.summary(round_to=3)

==================================Pre-Post Fit==================================
Formula: y ~ 1 + t + C(month)
Model coefficients:
  Intercept     	         0
  C(month)[T.2] 	      2.85
  C(month)[T.3] 	      1.16
  C(month)[T.4] 	      7.15
  C(month)[T.5] 	        15
  C(month)[T.6] 	      24.8
  C(month)[T.7] 	      18.2
  C(month)[T.8] 	      33.5
  C(month)[T.9] 	      16.2
  C(month)[T.10]	      9.19
  C(month)[T.11]	      6.28
  C(month)[T.12]	     0.578
  t             	      0.21

Effect Summary Reporting#

For decision-making, you often need a concise summary of the causal effect. The effect_summary() method provides a decision-ready report with key statistics.

Note

OLS vs PyMC Models: When using OLS models (scikit-learn), the effect_summary() provides confidence intervals and p-values (frequentist inference), rather than the posterior distributions, HDI intervals, and tail probabilities provided by PyMC models (Bayesian inference). OLS tables include: mean, CI_lower, CI_upper, and p_value, but do not include median, tail probabilities (P(effect>0)), or ROPE probabilities.

# Generate effect summary for the full post-period
stats = result.effect_summary()
stats.table

	mean	ci_lower	ci_upper	p_value	relative_mean	relative_ci_lower	relative_ci_upper
average	1.845561	0.161437	3.529686	0.032645	3.366709	0.150601	6.582817
cumulative	66.440209	5.811718	127.068701	0.032645	121.201506	5.421618	236.981394

# View the prose summary
print(stats.text)

Post-period (2017-01-31 00:00:00 to 2019-12-31 00:00:00), the average effect was 1.85 (95% CI [0.16, 3.53]), with a p-value of 0.033. The cumulative effect was 66.44 (95% CI [5.81, 127.07]); p-value 0.033. Relative to the counterfactual, this equals 3.37% on average (95% CI [0.15%, 6.58%]).