Title: | Out-of-Sample Time Series Forecasting |
---|---|
Description: | A comprehensive and cohesive API for the out-of-sample forecasting workflow: data preparation, forecasting - including both traditional econometric time series models and modern machine learning techniques - forecast combination, model and error analysis, and forecast visualization. |
Authors: | Tyler J. Pike [aut, cre] |
Maintainer: | Tyler J. Pike <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-10-30 03:47:22 UTC |
Source: | https://github.com/tylerjpike/oos |
Chart forecasts
chart_forecast(Data, Title, Ylab, Freq, zeroline = FALSE)
chart_forecast(Data, Title, Ylab, Freq, zeroline = FALSE)
Data |
data.frame: oos.forecast object |
Title |
string: chart title |
Ylab |
string: y-axis label |
Freq |
string: frequency (acts as sub-title) |
zeroline |
boolean: if TRUE then add a horizontal line at zero |
ggplot2 chart
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # chart forecasts chart.forecast = chart_forecast( forecasts, Title = 'test', Ylab = 'Index', Freq = 'Monthly', zeroline = TRUE)
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # chart forecasts chart.forecast = chart_forecast( forecasts, Title = 'test', Ylab = 'Index', Freq = 'Monthly', zeroline = TRUE)
Chart forecast errors
chart_forecast_error(Data, Title, Ylab, Freq, zeroline = FALSE)
chart_forecast_error(Data, Title, Ylab, Freq, zeroline = FALSE)
Data |
data.frame: oos.forecast object |
Title |
string: chart title |
Ylab |
string: y-axis label |
Freq |
string: frequency (acts as sub-title) |
zeroline |
boolean: if TRUE then add a horizontal line at zero |
ggplot2 chart
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # chart forecast errors chart.errors = chart_forecast_error( forecasts, Title = 'test', Ylab = 'Index', Freq = 'Monthly', zeroline = TRUE)
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # chart forecast errors chart.errors = chart_forecast_error( forecasts, Title = 'test', Ylab = 'Index', Freq = 'Monthly', zeroline = TRUE)
A function to impute missing values. Is used as a data preparation helper function and is called internally by forecast_univariate, forecast_multivariate, and forecast_combine.
data_impute(Data, method = "kalman", variables = NULL, verbose = FALSE)
data_impute(Data, method = "kalman", variables = NULL, verbose = FALSE)
Data |
data.frame: data frame of target variable, exogenous variables, and observed date (named 'date') |
method |
string: select which method to use from the imputeTS package; 'interpolation', 'kalman', 'locf', 'ma', 'mean', 'random', 'remove','replace', 'seadec', 'seasplit' |
variables |
string: vector of variables to standardize, default is all but 'date' column |
verbose |
boolean: show start-up status of impute.missing.routine |
data.frame with missing data imputed
A function to clean outliers. Is used as a data preparation helper function and is called internally by forecast_univariate, forecast_multivariate, and forecast_combine.
data_outliers( Data, variables = NULL, w.bounds = c(0.05, 0.95), trim = FALSE, cross_section = FALSE )
data_outliers( Data, variables = NULL, w.bounds = c(0.05, 0.95), trim = FALSE, cross_section = FALSE )
Data |
data.frame: data frame of target variable, exogenous variables, and observed date (named 'date') |
variables |
string: vector of variables to standardize, default is all but 'date' column |
w.bounds |
double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile) |
trim |
boolean: if TRUE then replace outliers with NA instead of winsorizing bound |
cross_section |
boolean: if TRUE then remove outliers based on cross-section (row-wise) instead of historical data (column-wise) |
data.frame with a date column and one column per forecast method selected
A function to estimate principal components.
data_reduction(Data, variables = NULL, ncomp, standardize = TRUE)
data_reduction(Data, variables = NULL, ncomp, standardize = TRUE)
Data |
data.frame: data frame of target variable, exogenous variables, and observed date (named 'date') |
variables |
string: vector of variables to standardize, default is all but 'date' column |
ncomp |
int: number of factors to create |
standardize |
boolean: normalize variables (mean zero, variance one) before estimating factors |
data.frame with a date column and one column per forecast method selected
A function to subset data recursively or with a rolling window to create a valid information set. Is used as a data preparation helper function and is called internally by forecast_univariate, forecast_multivariate, and forecast_combine.
data_subset(Data, forecast.date, rolling.window, freq)
data_subset(Data, forecast.date, rolling.window, freq)
Data |
data.frame: data frame of target variable, exogenous variables, and observed date (named 'date') |
forecast.date |
date: upper bound of information set |
rolling.window |
int: size of rolling window, NA if expanding window is used |
freq |
string: time series frequency; day, week, month, quarter, year; only needed for rolling window factors |
data.frame bounded by the given date range
A function to calculate various loss functions, including MSE, RMSE, MAE, and MAPE.
forecast_accuracy(Data)
forecast_accuracy(Data)
Data |
data.frame: data frame of forecasts, model names, and dates |
data.frame of numeric error results
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # forecast accuracy forecast.accuracy = forecast_accuracy(forecasts)
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # forecast accuracy forecast.accuracy = forecast_accuracy(forecasts)
A function to combine forecasts out-of-sample. Methods available include: uniform weights, median forecast, trimmed (winsorized) mean, n-best, ridge regression, lasso regression, elastic net, peLASSO, random forest, tree-based gradient boosting machine, and single-layer neural network. See package website for most up-to-date list of available models.
forecast_combine( Data, method = "unform", n.max = NULL, rolling.window = NA, trim = c(0.5, 0.95), burn.in = 1, parallel.dates = NULL )
forecast_combine( Data, method = "unform", n.max = NULL, rolling.window = NA, trim = c(0.5, 0.95), burn.in = 1, parallel.dates = NULL )
Data |
data.frame: data frame of forecasted values to combine, assumes 'date' and 'observed' columns, but ‘observed’ is not necessary for all methods |
method |
string: the method to use; 'uniform', 'median', 'trimmed.mean', 'n.best', 'peLasso', 'lasso', 'ridge', 'elastic', 'RF', 'GBM', 'NN' |
n.max |
int: maximum number of forecasts to select in n.best method |
rolling.window |
int: size of rolling window to evaluate forecast error over, use entire period if NA |
trim |
numeric: a two element vector with the winsorizing bounds for the trimmed mean method; c(min, max) |
burn.in |
int: the number of periods to use in the first model estimation |
parallel.dates |
int: the number of cores available for parallel estimation |
data.frame with a row for each combination method and forecasted date
# simple time series A = c(1:100) + rnorm(100) B = c(1:100) + rnorm(100) C = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A, B, C) # run forecast_univariate forecast.multi = forecast_multivariate( Data = Data, target = 'A', forecast.dates = tail(Data$date,5), method = c('ols','var'), horizon = 1, freq = 'month') # include observed valuesd forecasts = dplyr::left_join( forecast.multi, data.frame(date, observed = A), by = 'date' ) # combine forecasts combinations = forecast_combine( forecasts, method = c('uniform','median','trimmed.mean', 'n.best','lasso','peLasso'), burn.in = 5, n.max = 2)
# simple time series A = c(1:100) + rnorm(100) B = c(1:100) + rnorm(100) C = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A, B, C) # run forecast_univariate forecast.multi = forecast_multivariate( Data = Data, target = 'A', forecast.dates = tail(Data$date,5), method = c('ols','var'), horizon = 1, freq = 'month') # include observed valuesd forecasts = dplyr::left_join( forecast.multi, data.frame(date, observed = A), by = 'date' ) # combine forecasts combinations = forecast_combine( forecasts, method = c('uniform','median','trimmed.mean', 'n.best','lasso','peLasso'), burn.in = 5, n.max = 2)
A function to compare forecasts. Options include: simple forecast error ratios, Diebold-Mariano test, and Clark and West test for nested models
forecast_comparison( Data, baseline.forecast, test = "ER", loss = "MSE", horizon = NULL )
forecast_comparison( Data, baseline.forecast, test = "ER", loss = "MSE", horizon = NULL )
Data |
data.frame: data frame of forecasts, model names, and dates |
baseline.forecast |
string: column name of baseline (null hypothesis) forecasts |
test |
string: which test to use; ER = error ratio, DM = Diebold-Mariano, CM = Clark and West |
loss |
string: error loss function to use if creating forecast error ratio |
horizon |
int: horizon of forecasts being compared in DM and CW tests |
numeric test result
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # run ER (MSE) er.ratio.mse = forecast_comparison( forecasts, baseline.forecast = 'naive', test = 'ER', loss = 'MSE')
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # run forecast_univariate forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,10), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, freq = 'month') forecasts = dplyr::left_join( forecast.uni, data.frame(date, observed = A), by = 'date' ) # run ER (MSE) er.ratio.mse = forecast_comparison( forecasts, baseline.forecast = 'naive', test = 'ER', loss = 'MSE')
A function to subset data recursively or with a rolling window to create a valid information set. Is used as a data preparation helper function and is called internally by forecast_univariate, forecast_multivariate, and forecast_combine.
forecast_date(forecast.date, horizon, freq)
forecast_date(forecast.date, horizon, freq)
forecast.date |
date: date forecast was made |
horizon |
int: periods ahead of forecast |
freq |
string: time series frequency; day, week, month, quarter, year; only needed for rolling window factors |
date vector
A function to estimate multivariate forecasts out-of-sample. Methods available include: vector auto-regression, linear regression, lasso regression, ridge regression, elastic net, random forest, tree-based gradient boosting machine, and single-layer neural network. See package website for most up-to-date list of available models.
forecast_multivariate( Data, forecast.dates, target, horizon, method, rolling.window = NA, freq, lag.variables = NULL, lag.n = NULL, outlier.clean = FALSE, outlier.variables = NULL, outlier.bounds = c(0.05, 0.95), outlier.trim = FALSE, outlier.cross_section = FALSE, impute.missing = FALSE, impute.method = "kalman", impute.variables = NULL, impute.verbose = FALSE, reduce.data = FALSE, reduce.variables = NULL, reduce.ncomp = NULL, reduce.standardize = TRUE, parallel.dates = NULL, return.models = FALSE, return.data = FALSE )
forecast_multivariate( Data, forecast.dates, target, horizon, method, rolling.window = NA, freq, lag.variables = NULL, lag.n = NULL, outlier.clean = FALSE, outlier.variables = NULL, outlier.bounds = c(0.05, 0.95), outlier.trim = FALSE, outlier.cross_section = FALSE, impute.missing = FALSE, impute.method = "kalman", impute.variables = NULL, impute.verbose = FALSE, reduce.data = FALSE, reduce.variables = NULL, reduce.ncomp = NULL, reduce.standardize = TRUE, parallel.dates = NULL, return.models = FALSE, return.data = FALSE )
Data |
data.frame: data frame of target variable, exogenous variables, and observed date (named 'date'); may alternatively be a |
forecast.dates |
date: dates forecasts are created |
target |
string: column name in Data of variable to forecast |
horizon |
int: number of periods into the future to forecast |
method |
string: methods to use |
rolling.window |
int: size of rolling window, NA if expanding window is used |
freq |
string: time series frequency; day, week, month, quarter, year |
lag.variables |
string: vector of variables to lag each time step, if lag.n is not null then the default is all non-date variables |
lag.n |
int: number of lags to create |
outlier.clean |
boolean: if TRUE then clean outliers |
outlier.variables |
string: vector of variables to purge of outlier, default is all but 'date' column |
outlier.bounds |
double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile) |
outlier.trim |
boolean: if TRUE then replace outliers with NA instead of winsorizing bound |
outlier.cross_section |
boolean: if TRUE then remove outliers based on cross-section (row-wise) instead of historical data (column-wise) |
impute.missing |
boolean: if TRUE then impute missing values |
impute.method |
string: select which method to use from the imputeTS package; 'interpolation', 'kalman', 'locf', 'ma', 'mean', 'random', 'remove','replace', 'seadec', 'seasplit' |
impute.variables |
string: vector of variables to impute missing values, default is all numeric columns |
impute.verbose |
boolean: show start-up status of impute.missing.routine |
reduce.data |
boolean: if TRUE then reduce dimension |
reduce.variables |
string: vector of variables to impute missing values, default is all numeric columns |
reduce.ncomp |
int: number of factors to create |
reduce.standardize |
boolean: normalize variables (mean zero, variance one) before estimating factors |
parallel.dates |
int: the number of cores available for parallel estimation |
return.models |
boolean: if TRUE then return list of models estimated each forecast.date |
return.data |
boolean: if True then return list of information.set for each forecast.date |
data.frame with a row for each forecast by model and forecasted date
# simple time series A = c(1:100) + rnorm(100) B = c(1:100) + rnorm(100) C = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A, B, C) # run forecast_univariate forecast.multi = forecast_multivariate( Data = Data, target = 'A', forecast.dates = tail(Data$date,5), method = c('ols','var'), horizon = 1, # information set rolling.window = NA, freq = 'month', # data prep lag.n = 4, outlier.clean = TRUE, impute.missing = TRUE)
# simple time series A = c(1:100) + rnorm(100) B = c(1:100) + rnorm(100) C = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A, B, C) # run forecast_univariate forecast.multi = forecast_multivariate( Data = Data, target = 'A', forecast.dates = tail(Data$date,5), method = c('ols','var'), horizon = 1, # information set rolling.window = NA, freq = 'month', # data prep lag.n = 4, outlier.clean = TRUE, impute.missing = TRUE)
A function to estimate univariate forecasts out-of-sample. Methods available include all forecast
methods from the forecast
package. See package website for most up-to-date list of available models.
forecast_univariate( Data, forecast.dates, methods, horizon, recursive = TRUE, rolling.window = NA, freq, outlier.clean = FALSE, outlier.variables = NULL, outlier.bounds = c(0.05, 0.95), outlier.trim = FALSE, outlier.cross_section = FALSE, impute.missing = FALSE, impute.method = "kalman", impute.variables = NULL, impute.verbose = FALSE, parallel.dates = NULL, return.models = FALSE, return.data = FALSE )
forecast_univariate( Data, forecast.dates, methods, horizon, recursive = TRUE, rolling.window = NA, freq, outlier.clean = FALSE, outlier.variables = NULL, outlier.bounds = c(0.05, 0.95), outlier.trim = FALSE, outlier.cross_section = FALSE, impute.missing = FALSE, impute.method = "kalman", impute.variables = NULL, impute.verbose = FALSE, parallel.dates = NULL, return.models = FALSE, return.data = FALSE )
Data |
data.frame: data frame of variable to forecast and a date column; may alternatively be a |
forecast.dates |
date: dates forecasts are created |
methods |
string: models to estimate forecasts |
horizon |
int: number of periods to forecast |
recursive |
boolean: use sequential one-step-ahead forecast if TRUE, use direct projections if FALSE |
rolling.window |
int: size of rolling window, NA if expanding window is used |
freq |
string: time series frequency; day, week, month, quarter, year |
outlier.clean |
boolean: if TRUE then clean outliers |
outlier.variables |
string: vector of variables to purge of outliers, default is all but 'date' column |
outlier.bounds |
double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile) |
outlier.trim |
boolean: if TRUE then replace outliers with NA instead of winsorizing bound |
outlier.cross_section |
boolean: if TRUE then remove outliers based on cross-section (row-wise) instead of historical data (column-wise) |
impute.missing |
boolean: if TRUE then impute missing values |
impute.method |
string: select which method to use from the imputeTS package; 'interpolation', 'kalman', 'locf', 'ma', 'mean', 'random', 'remove','replace', 'seadec', 'seasplit' |
impute.variables |
string: vector of variables to impute missing values, default is all numeric columns |
impute.verbose |
boolean: show start-up status of impute.missing.routine |
parallel.dates |
int: the number of cores available for parallel estimation |
return.models |
boolean: if TRUE then return list of models estimated each forecast.date |
return.data |
boolean: if True then return list of information.set for each forecast.date |
data.frame with a row for each forecast by model and forecasted date
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # estiamte univariate forecasts forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,5), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, # information set rolling.window = NA, freq = 'month', # data prep outlier.clean = TRUE, impute.missing = TRUE)
# simple time series A = c(1:100) + rnorm(100) date = seq.Date(from = as.Date('2000-01-01'), by = 'month', length.out = 100) Data = data.frame(date = date, A) # estiamte univariate forecasts forecast.uni = forecast_univariate( Data = Data, forecast.dates = tail(Data$date,5), method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, # information set rolling.window = NA, freq = 'month', # data prep outlier.clean = TRUE, impute.missing = TRUE)
data_impute
model estimationA function to create the data imputation method arguments list for user manipulation.
instantiate.data_impute.control_panel()
instantiate.data_impute.control_panel()
data_impute.control_panel
forecast_combine
model estimationA function to create the forecast combination technique arguments list for user manipulation.
instantiate.forecast_combinations.control_panel(covariates = NULL)
instantiate.forecast_combinations.control_panel(covariates = NULL)
covariates |
int: the number of features that will go into the model |
forecast_combinations.control_panel
forecast_multivariate
ML estimationA function to create the multivariate forecast methods arguments list for user manipulation.
instantiate.forecast_multivariate.ml.control_panel( covariates = NULL, rolling.window = NULL, horizon = NULL )
instantiate.forecast_multivariate.ml.control_panel( covariates = NULL, rolling.window = NULL, horizon = NULL )
covariates |
int: the number of features that will go into the model |
rolling.window |
int: size of rolling window, NA if expanding window is used |
horizon |
int: number of periods into the future to forecast |
forecast_multivariate.ml.control_panel
forecast_multivariate
VAR estimationA function to create the multivariate forecast methods arguments list for user manipulation.
instantiate.forecast_multivariate.var.control_panel()
instantiate.forecast_multivariate.var.control_panel()
forecast_multivariate.var.control_panel
forecast_univariate
model estimationA function to create the univariate forecast method arguments list for user manipulation.
instantiate.forecast_univariate.control_panel()
instantiate.forecast_univariate.control_panel()
forecast_univariate.control_panel
A function to calculate various error loss functions. Options include: MSE, RMSE, MAE, and MAPE. The default is MSE loss.
loss_function(forecast, observed, metric = "MSE")
loss_function(forecast, observed, metric = "MSE")
forecast |
numeric: vector of forecasted values |
observed |
numeric: vector of observed values |
metric |
string: loss function |
numeric test result
A function to create 1 through n lags of a set of variables. Is used as a data preparation helper function and is called internally by forecast_univariate, forecast_multivariate, and forecast_combine.
n.lag(Data, lags, variables = NULL)
n.lag(Data, lags, variables = NULL)
Data |
data.frame: data frame of variables to lag and a 'date' column |
lags |
int: number of lags to create |
variables |
string: vector of variable names to lag, default is all non-date variables |
data.frame
A function to subset the n-best forecasts; assumes column named observed.
NBest(forecasts, n.max, window = NA)
NBest(forecasts, n.max, window = NA)
forecasts |
data.frame: a data frame of forecasts to combine, assumes one column named "observed" |
n.max |
int: maximum number of forecasts to select |
window |
int: size of rolling window to evaluate forecast error over, use entire period if NA |
data.frame with n columns of the historically best forecasts
Standardize variables (mean 0, variance 1)
standardize(X)
standardize(X)
X |
numeric: vector to be standardized |
numeric vector of standardized values
Winsorize or trim variables
winsorize(X, bounds, trim = FALSE)
winsorize(X, bounds, trim = FALSE)
X |
numeric: vector to be winsorized or trimmed |
bounds |
double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile) |
trim |
boolean: if TRUE then replace outliers with NA instead of winsorizing bound |
numeric vector of winsorized or trimmed values