Group Method of Data Handling (GMDH)-type neural network algorithms are the heuristic self organization method for the modelling of complex systems. GMDH algorithms are utilized for a variety of purposes, examples include identification of physical laws, the extrapolation of physical fields, pattern recognition, clustering, the approximation of multidimensional processes, forecasting without models, etc. In this study, the R package GMDH is presented to make short term forecasting through GMDH-type neural network algorithms. The GMDH package has options to use different transfer functions (sigmoid, radial basis, polynomial, and tangent functions) simultaneously or separately. Data on cancer death rate of Pennsylvania from 1930 to 2000 are used to illustrate the features of the GMDH package. The results based on ARIMA models and exponential smoothing methods are included for comparison.
Time series data are ordered successive observations which are measured in equally or unequally spaced time. Time series data may include dependency among successive observations. Hence, the order of the data is important. Time series data appear in various areas and disciplines such as medical studies, economics, the energy industry, agriculture, meteorology, and so on. Modelling time series data utilizes the history of the data and makes forecasting using this history. At times, statistical models are not sufficient to solve some problems. Examples include pattern recognition, forecasting, identification, etc. Extracting the information from the measurements has advantages while modelling complex systems when there is not enough prior information and/or no theory is defined to model the complex systems. Selecting a model automatically is a powerful way for the researchers who are interested in the result and do not have sufficient statistical knowledge and sufficient time (Mueller et al. 1998) for an analysis.
The objective of this study is to develop an R package for forecasting of time series data. Some of recent softwares developed for time series are glarma, ftsa, MARSS, ensembleBMA, ProbForecastGOP, and forecast (Hyndman and Khandakar 2008; Fraley et al. 2011; Holmes, E. J. Ward, and K. Wills 2012; Shang 2013; Dunsmuir and D. J. Scott 2015). In this study, we focused on the development of an R package for short term forecasting via Group Method of Data Handling (GMDH) algorithms. The history of GMDH-type neural network is based on works from the end of the 1960s and the beginning of the 1970s. First, (Ivakhnenko 1966) introduced a polynomial, which is the basic algorithm of GMDH, to construct higher order polynomials. Also, (Ivakhnenko 1970) introduced heuristic self-organization methods which constructed the main working system of GMDH algorithm. Heuristic self-organization method defines the way that the algorithm evolves, following rules such as external criteria. The GMDH method, convenient for complex and unstructured systems, has benefits over high order regression (Farlow 1981).
(Kondo 1998) proposed GMDH-type neural network in which the algorithm works according to the heuristic self-organization method. (Kondo and Ueno 2006a,b) proposed a GMDH algorithm which has a feedback loop. According to this algorithm, the output obtained from the last layer is set as a new input variable, provided a threshold is not satisfied in the previous layer. The system of the algorithm is organized by a heuristic self-organization method where a sigmoid transfer function is integrated. (Kondo and Ueno 2007) proposed a logistic GMDH-type neural network. The difference from a conventional GMDH algorithm was that the new one would take linear functions of all inputs at the last layer. (Kondo and Ueno 2012) included three transfer functions (sigmoid, radial basis and polynomial functions) in the feedback GMDH algorithm. (Srinivasan 2008) used a GMDH-type neural network and traditional time series models to forecast predicted energy demand. It was shown that a GMDH-type neural network was superior in forecasting energy demand compared to traditional time series models with respect to mean absolute percentage error (MAPE). In another study, (Xu et al. 2012) applied a GMDH algorithm and ARIMA models to forecast the daily power load. According to their results, GMDH-based results were superior to the results of ARIMA models in terms of MAPE for forecasting performance.
There are some difficulties when applying a GMDH-type neural network. For example, there is no freely available software for researchers implementing the GMDH algorithms in the literature. We present the R package GMDH to make short term forecasting through GMDH-type neural network algorithms. The package includes two types of GMDH structures; namely, GMDH structure and revised GMDH (RGMDH) structure. Also, it includes a variety of options to use different transfer functions (sigmoid, radial basis, polynomial, and tangent functions) simultaneously or separately. Data on the cancer death rate of Pennsylvania from 1930 to 2000 are used to illustrate the implementation of GMDH package. We compare the results to those based on ARIMA models and exponential smoothing (ES) methods.
In this section, data preparation, two types of GMDH-type neural network structures, and estimation of a regularization parameter in regularized least square estimation (RLSE) are given.
Data preparation has an important role in GMDH-type neural network
algorithms. To get rid of very big numbers in calculations and to be
able to use all transfer functions in the algorithm, it is necessary for
range of the data to be in the interval of (0, 1). If
with
and
During the estimation and forecasting process in GMDH-type neural
network algorithms, all calculations are done using the scaled data set,
Let’s assume a time series dataset for
Subject | |||||
---|---|---|---|---|---|
1 | |||||
2 | |||||
3 | |||||
A better model which explains the relation between response and lagged time series is captured via transfer functions. The sigmoid, radial basis, polynomial, and tangent functions, presented in Table 2, are mainly used to explain the relation between inputs and output in GMDH-type neural network algorithms (Kondo and Ueno 2012). We use all transfer functions, stated in Table 2, simultaneously in each neuron. In other words, we construct four models at each neuron, and then the model which gives the smallest prediction mean square error (PMSE) is selected as the current transfer function at the corresponding neuron.
Sigmoid Function | |
Radial Basis Function | |
Polynomial Function | |
Tangent Function |
GMDH-type neural network algorithms are modeling techniques which learn the relations among the variables. In the perspective of time series, the algorithm learns the relationship among the lags. After learning the relations, it automatically selects the way to follow in algorithm. First, GMDH was used by (Ivakhnenko 1966) to construct a high order polynomial. The following equation is known as the Ivakhnenko polynomial given by
where
The GMDH algorithm considers all pairwise combinations of
The GMDH algorithm is a system of layers in which there exist neurons.
The number of neurons in a layer is defined by the number of input
variables. To illustrate, assume that the number of input variables is
equal to
In the GMDH architecture shown in Figure 1, since the
number of inputs is equal to four, the number of nodes in a layer is
determined to be six. This is just a starting layer to the algorithm.
The coefficients of equation (6) are estimated in each neuron.
By using the estimated coefficients and input variables in each neuron,
the desired output is predicted. According to a chosen external
criteria,
In a GMDH algorithm, there exist six coefficients to be estimated in each model. Coefficients are estimated via RLSE.
A GMDH-type neural network constructs the algorithm by investigating the
relation between two inputs and the desired output. Architecture of a
revised GMDH (RGMDH)-type neural network does not only consider this
relation, but it also considers the individual effects on the desired
output (Kondo and Ueno 2006b). There are two different types of neurons in
an RGMDH-type neural network. In the first type of neuron, it is same as
in GMDH-type neural network, given as in equation (6). That is,
two inputs enter the neuron, one output goes out. In the second type of
neuron,
where
As mentioned above, there exist
In each estimation step, there exist the coefficients to be estimated.
While we are estimating these coefficients, we use the regularized least
square estimation method. It is stated that regularized least square
estimation is utilized when there is the possibility of a
multi-collinearity problem by integrating a regularization parameter,
We integrate the estimation of a regularization parameter (penalizing
term) via validation in GMDH algorithms. For this purpose, we divide the
data into two parts: a learning set and a testing set. In the GMDH
package,
Clarify the possible regularization parameter,
For each possible
After the calculation of coefficients, calculate the predicted values by utilizing the test set to obtain the MSE for each regularization parameter.
Select the regularization parameter which gives the minimum MSE value.
The data used in this application of the GMDH package are the yearly cancer death rate (per 100,000 population) in the Pennsylvania between 1930 and 2000. The data were documented in Pennsylvania Vital Statistics Annual Report by the Pennsylvania Department of Health in 2000 (Wei 2006). This dataset is also available as a dataset in the package GMDH. After installing the GMDH package, it can be loaded into an R workspace by
R> library("GMDH")
R> data("cancer") # load cancer data
After the cancer death rate data set is loaded, one may use fcast
function in GMDH package for short-term forecasting. To utilize the
GMDH structure for forecasting, method is set to "GMDH". One should
set the method to "RGMDH" to use the RGMDH structure.
R> out = fcast(cancer[1:66], method = "GMDH", input = 15, layer = 1, f.number = 5,
level = 95, tf = "all", weight = 0.70, lambda = c(0, 0.01, 0.02, 0.04, 0.08, 0.16,
0.32, 0.64, 1.28, 2.56, 5.12, 10.24))
Point Forecast Lo 95 Hi 95
67 249.5317 244.9798 254.0836
68 249.6316 244.4891 254.7741
69 248.9278 243.0318 254.8239
70 247.0385 240.7038 253.3731
71 244.7211 237.1255 252.3168
# display fitted values
R> out$fitted
# return residuals
R> out$residuals
# show forecasts
R> out$mean
In this part, we divided the data into two parts for the aim of
observing the ability of methods on prediction auto.arima
and ets
, which
use grid search, select the best model according to the criteria of
either AIC, AICc or BIC. For this data set, the functions suggested the
model ARIMA (1, 1, 0) with intercept and an ES method with
multiplicative errors, additive damped trend and no seasonality (M, Ad,
N), respectively. We also added the model ARIMA (0, 1, 0) with intercept
for this data set suggested by (Wei 2006). For all models,
prediction mean square error (PMSE) and forecasting mean square error
(FMSE) are stated in Table 3.
PMSE | FMSE | |
---|---|---|
GMDH | 4.985 | 4.575 |
RGMDH | 4.287 | 4.102 |
ARIMA(1, 1, 0) with intercept | 5.995 | 81.874 |
ARIMA(0, 1, 0) with intercept | 6.324 | 73.756 |
ES (M, Ad, N) | 6.153 | 17.508 |
The best forecasting performance belongs to the RGMDH algorithm and its prediction accuracy also yields better results as compared to the GMDH, ARIMA and ES models. Moreover, the GMDH algorithm outperforms the ARIMA and ES models in prediction and forecasting. To avoid visual pollution in Figure 4, we include only the predictions and forecasts of RGMDH algorithm and ES (M, Ad, N).
In this study, we used GMDH-type neural network algorithms, the heuristic self-organization method for the modelling of complex systems, to make forecasts for time series data sets. Our primary focus was to develop a free software implementation. Concretely, we developed an R package GMDH to make forecasting in the short term via GMDH-type neural network algorithms. Also, we included different transfer functions (sigmoid, radial basis, polynomial, and tangent functions) into the GMDH package. Our R package allows that these functions can be used simultaneously or separately, as desired.
In the estimation of coefficients, since we construct the model for the data with lags, there exists a high possibility of there occurring a multi-collinearity problem. Therefore, we utilized regularized least square estimation to handle such occurences. It is important to note that estimation of a regularization parameter is the question of interest. Validation was applied in order to estimate the regularization term. After selection of a regularization term, coefficients were estimated by the help of all observations and the regularization parameter.
Application of the algorithms on a real life dataset suggests improved performance of GMDH-type neural network algorithms over ARIMA and ES models in prediction and short term forecasting. Researchers are able to use GMDH algorithms easily since our R package GMDH is available on Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=GMDH.
Future studies are planned in the direction of transfer functions. In this study, we used four different transfer functions - sigmoid, radial basis, polynomial, and tangent functions - into GMDH algorithms. We plan to integrate the Box-Cox transformation into GMDH algorithms. GMDH algorithms with four transfer functions and GMDH algorithms with Box-Cox transformation are going to be performed on real data applications to compare the prediction and short term forecasting. After being documented, the related GMDH algorithms with the Box-Cox transformation are going to be implemented in the R package GMDH.
We thank the anonymous reviewers for their constructive comments and suggestions which helped us to improve the quality of our paper.
glarma, ftsa, MARSS, ensembleBMA, ProbForecastGOP, forecast
Bayesian, Econometrics, Environmetrics, Finance, FunctionalData, MissingData, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Dag & Yozgatligil, "GMDH: An R Package for Short Term Forecasting via GMDH-Type Neural Network Algorithms", The R Journal, 2016
BibTeX citation
@article{RJ-2016-028, author = {Dag, Osman and Yozgatligil, Ceylan}, title = {GMDH: An R Package for Short Term Forecasting via GMDH-Type Neural Network Algorithms}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {1}, issn = {2073-4859}, pages = {379-386} }